SOME RESULTS IN BHATTACHARYYA DISTANCE- BASED LINEAR 
DISCRIMINATION AND IN DESIGN OF SIGNALS 


A Thesis Submitted 

in Partiai Fuifiiment of the Requirements 
for the Degree of 

DOCTOR OF PHILOSOPHY 


by 

GOPAL CHAUDHURI 


to the 

DEPARTMENT OF MATHEMATICS 

INDIAN INSTITUTE OF TECHNOLOGY KANPUR 

DECEMBER, 1989 



To 


The loving memory 
of my parents 



CERTIFICATE 



Certified that this work, entitled, ’’SOKE RESULTS IN 
BHATTACHARYYA DISTANCE-BASED LINEAR DISCRIMINATION AND IN DESIGN 
OF SIGNALS” by Gopal Chaudhuri, has been carried out under our 
supervision and has not been submitted elsewhere for a degree. 


P.R.K. Rao 
Prof essor 

Department of Electrical Engineering 
Indian Institute of Technology 
Kanpur 



I 


J.D. Borwanker 
^Professor 

Department of Mathematics 
Indian Institute of Technolo 
Kanpur 


December 15 . 1989 





lisssr* 

0 ~(j^ 




ACKNOWLEDGEMENTS 


The author wishes to record here, his deep- gratitude 
to Professor J.D. Borwanker of the Department of Mathematics 
and Professor P.R.K. Rao of the Department of Electrical 
Engineering for their invaluable guidence and keen interest 
during the course of this work. 

The author is grateful to Drs P.C.Das, G.K. Shukla, 

S. Madan and P. Dutta with whom he had many useful discussions. 

The author is indebted to his family and other friends 
for keeping his spirits high. 

The author also extends thanks to Mr. Ashok Kumar Bhatia 
for having typed this thesis. 


December, 1989 



i! 





GOPAL CHAUDHURI 



CONTENTS 


LIST OF FIGURES 

LIST OF TABLES 

SOME NOTATIONS AND SYMBOLS 

SYNOPSIS 

PAGE 

CHAPTER I INTRODUCTION 1 

1.1 Problem 1 

1.2 Review of Previous Work 7 

1.3 Summary 13 

CHAPTER II PRELIMINARIES 16 

2.1 Introduction 16 

2.2 Statistical Distance Measures 16 

2.3 Bhattacharyya Distance 18 

2.4 Stochastic Processes : Some Definitions 

and Results 22 

2.5 Complex Normal Processes 24 

CHAPTER III LINEAR DISCRIMINANT FUNCTIONS FOR 

DISCRETE TIME SERIES 26 

3.1 Introduction 26 

3.2 Mathematical Formulation of the Problem 27 

3.3 Method for Obtaining a in the 

Discriminant Function” 29 

3.3.1 An Example 32 

3.3.2 Convergence of the Iteration Process 34 

3.3.3 Some Special Categories of Problems 41 

3.3.4 Other Linear Discriminant Functions 46 

3.3.5 Comparison of the Various Linear 

Discriminant Functions 50 

3.4 Comparison of the Behaviour of the LDF 

Obtained by Maximizing the Bhattacharyya 
Distance with the Quadratic (optimal) 
Discriminant Function when R^=dR-, 92 

3.4.1 Numerical Results ^ x 

3.5 Two Classes of Tests (Modified Minimax 

Rule) 109 

3.5.1 An Illustrative Example 113 



Page 

3.6 Method of Obtaining a in the case of 

Large Sample for Stationary Time Series 114 

3.6.1 Expression for the Optimal a 114 

3.6.2 Examples ~ 122 

3.7 Conclusion 137 

CHAPTER IV LINEAR DISCRIMINANT FUNCTIONS FOR 

CONTINUOUS-TIME SERIES 138 

4.1 Introduction 138 

4.2 Second Order Time Series 138 

4.3 When the Process is Stationary 148 

4.4 Conclusion 156 

CHAPTER V DESIGN OF SIGNALS 158 

5.1 Introduction 158 

5.2 Mathematical Statement of the Problem 

of Signal Selection 158 

5.3 Methods of Signal Selection 160 

5.4 Numerical Results 168 

5.5 Conclusion 170 

CHAPTER VI LINEAR DISCRIMINANT FUNCTIONS AND DESIGN 

OF SIGNALS FOR COMPLEX NORMAL TIME SERIES 171 

6.1 Introduction 171 

6.2 Formulation of the Problem 171 

6.3 Discrete Time Series 173 

6.3.1 Method of Finding the Alpha Vector 

for an Arbitrary Time Series 173 

6.3.2 Method for Finding a in the case of 
Large Sample for Covariance Stationary 

Time Series 174 

6.3.3 Some Special Categories of Problems 176 

6.4 Continuous Time Series 180 

6.4.1 Second Order Time Series 180 

6.4.2 Covariance Stationary Time Series 

with Index Set [o,«>) 182 

6.4.3 Some Special Categories of Problems 184 

6.5 Design of Signals 188 

6.6 Conclusion 191 

CHAPTER VII CONCLUSIONS AND SUGGESTIONS FOR FURTHER 

WORK 194 



mDuQ Tirn DOCD 


Page 


APPENDIX 


A 3HATTACHARYYA DISTANCE AND THE 

TRIANGLE INEQUALITY 197 

COMBINING TVtO QUADRATIC FORMS 198 

FLOW chart of the ITERATION 200 

SOLUTIONS OF EQUATIONS BY GRAEFFE’S 
METHOD 201 

TWO LEMMAS 204 

SERIES REPRESENTATION OF A STOCHASTIC 
PROCESS 213 

THE DIRaC delta FUNCTION 215 

DIFFERENTIATION WITH RESPECT TO A MATRIX 217 
INTEGRATION OF COMPLEX STOCHASTIC 
PROCESSES 218 


221 


REFERENCES 



LIST OF FIGURES 


Figure 

3.1 

Graph of ?' (9 ) 




40 

Figure 

3.2 

Example 3.3.5 

(AR(1)), 

n 

= 

2 

55 

Figure 

3.3 

Example 3.3.5 

(AR(1)), 

n 

=r 

3 

56 

F igure 

3.4 

Example 3.3.5 

(AR(1)), 

n 

= 

4 

57 

Figure 

3.5 

Example 3.3.5 

(AR(1)), 

n 

= 

5 

58 

F igure 

3.6 

Example 3.3.5 

(AR(1)), 

n 


10 

59 

Figure 

3.7 

Example 3.3.5 

(ar(i)). 

n 

= 

20 

60 

F igure 

3.8 

Example 3.3.6 

(AR(1)), 

n 


2 

61 

Figure 

3.9 

Example 3.3.6 

(AR(1)), 

n 

= 

20 

62 

Figure 

3.10 

Example 3.3.7 

(AR(2)), 

n 

= 

2 

72 

Figure 

3.11 

Example 3.3.7 

(AR(2)), 

n 

= 

3 

73 

Figure 

3.12 

Example 3.3.7 

(AR(2)), 

n 

= 

4 

74 

Figure 

3.13 

Example 3.3.7 

(AR(2)), 

n 


5 

75 

Figure 

3.14 

Example 3.3.7 

(AR(2)), 

n 


10 

76 

Figure 

3.15 

Example 3.3.7 

(AR(2)), 

n 

= 

20 

77 

Figure 

3.16 

Example 3.3.8 

(AR(2)), 

n 

= 

2 

78 

Figure 

3.17 

Example 3.3.8 

(AR(2)), 

n 


20 

79 

F igure 

3.18 

Example 3.3.9 

CM 

n 

= 

2 

80 

Figure 

3.19 

Example 3.3.9 

(AR{2)), 

n 

= 

20 

81 

F igure 

3.20 

Comparison between LDF 

ancJ 

QDF 

100 

F igure 

3.20 

Comparison between LDF 

and 

QDF 

101 

F igure 

3.21 

Comparison between LDF 

and 

QDF 

102 

Figure 

3.21 

Comparison between LDF 

and 

QDF 

103 



a 


F igure 

3.22 

Comparison between LDF 

and QDF 

104 

F igure 

3.22 

Comparison between LDF 

and QDF 

105 

F igure 

3.23 

Example 3.6.1 (AR(2)) 


130 

F igure 

3.24 

Example 3.6.2(AR(2)) 


131 

F igure 

3.25 

Example 3.6.3 (AR(2)) 


132 

F igure 

3.26 

Example 3.6.4 (AR(2)) 


133 

F igure 

3.27 

Example 3.6.5 (AR(1)) 


134 

Figure 

3.28 

Example 3.6.6 (AR(1)) 


135 

F igure 

3.29 

Example 3.6.7 (AR(1)) 


136 



LIST OF TABLES 


Table 3.1 
Table 3.2 
Table 3.3 


Table 3.4 


Table 3.5 


Table 3.6 


Table 3.7 


Verification in Example 3.3.2 41 

Comparison in Example 3.3.1 51 

Type II errors 62 resulting from the 
use of LDFs obtained by maximizing 
-lnP2(l>2;y) in Examples 3. 3. 4, 3. 3. 5, 

3.3.6 63 

Type II errors 62 resulting from the 
use of LDFs obtained by maximizing 
1(1, 2;y) in Examples 3. 3. 4, 3. 3. 5, 

3.3.6 64 

Type II errors 02 resulting from the 
use of LDFs obtained by maximizing 
1(2, l;y) in Examples 3. 3. 4, 3. 3. 5, 

3.3.6 65 

Type II errors 02 resulting from the 
use of LDFs obtained by maximizing 
J(l,2;y) in Examples 3. 3. 4, 3. 3. 5, 

3.3.6 66 

Type II errors e2 resulting from the 
use of LDFs obtained by the A-B 
procedure in Examples 3. 3. 4, 3. 3. 5, 

3.3.6 


67 



iv 


Table 3.8 Results due to “lnP2(l»2;y) in 

Examples 3. 3. 7, 3. 3. 8, 3. 3. 9 82 

Table 3.9 Results due to 1(1, 2;y) in Examples 

3. 3. 7, 3. 3. 8, 3. 3. 9 83 

Table 3.10 Results due to 1(2, l;y) in Examples 

3. 3. 7, 3. 3. 8, 3. 3. 9 84 

Table 3.11 Results due to J(l,2;y) in Examples 

3. 3. 7, 3. 3. 8, 3. 3. 9 85 

Table 3.12 Results due to the A-3 procedure 

in Examples 3. 3. 7, 3. 3. 8, 3. 3. 9 86 

Table 3.13 Admissibility in Example 3.3.1 88 

Table 3.14 Polynomial method in Example 3.3.1 92 


Table 3.15 The Probability of misclassif ication 

resulting from the use of LDF based 


on 3hattacharyya distance 106 

Table 3.16 The probability of misclassif ication 

obtained by using QDF (n = 1,2) 107 

Table 3.17 The probability of misclassif ication 

obtained by using QDF (n =-6, n=*lo) 108 
Table 3.18 Computations in Example 3.6.1 128 

Table 3.19 Computations in Example 3.6.2 129 

Table 5.1 Computation in Example 5.4.1 169 

Table 5.2 Computation in Example 5.4.2 169 



ABS 

E 


e xp 
s up 
max 
min 


^max 

X . 
min 

Nn(- 

lim 


(A) 

(A) 


lim 
Cov 
r (x) 


p.d. 

$ (x ) 

* 

SL 


A 

1A| 


n 

A* 


Some Notations and Symbols 

absolute value 

expectation operator 

exponential 

supremum 

maximum 

minimum 

maximum eigen value of A 
minimum eigen value of A 

n-dimensional multivariate normal distribution 
upper limit 
lower limit 
covariance 

derivative of 'i'(x) with respect to x 
positive definite 

standard normal distribution function 
optimal a, also sometimes denoted by 
complex conjugate-transpose of a 

•v> 

approximately 
defined by 

determinant of a matrix A 
(at bottom) a vector 
statistical population 
transpose of a matrix A 


?P> 





e .V. 


[a,b] 

4 


e 

Re (z ) 
Im(z ) 
1x1 
Z 

d (x,y) 

^nm 

X ~ 
<=> 

5 


vi 

eigen vector 
complex conjugate of a 
a closed interval 

space of square integrable functions on a specified 

interval 

belongs to 

real part of z 

imaginary part of z 

absolute values of x 

summation sign 

converges to 

distance between x and y 
Dirac delta function 
Kronecker delta 
implies 

X is distributed as 
if and only if 
such that 



SYNOPSIS 


GOPAL CHAUDHURI 

DEPARTMENT OF AAATHEMATICS 

SOME RESULTS IN BHATTACHARYYA 

DISTANCE-BASED LINEAR DISCRIMINATION 

AND IN DESIGN OF SIGNALS 

1. DR. J.D. aOR WANKER 

2. DR. PRK RAO 

DECEMBER, 1989 

In many practical situations it is of interest to 
classify a normal time series as belonging to one or the 
other of two categories described by two hypotheses. The 
admissible procedure for classification provided by the 
Neyman-Pearson theory as well as the Bayes’ rule are based 
on the likelihood ratio . In the case of unequal covariance 
matrices this likelihood ratio depends on a quadratic function 
of observations. Unfortunately , the distribution theory 
pertaining to the quadratic part of this classification rule 
is extremely complicated . It involves the weighted sum of 
non-central chi-square random variables so that computing 
error rates resulting from its use seems difficult . Hence 
the usual approach has been to consider a linear procedure. 

In this work we study optimal classification rules based on 
linear statistics which maximize the Bhattacharyya distance. 



viii 


The present thesis is divided into seven chapters. A 
brief review of the previous work done in the area of two-group 
classification relevant to our discussion, and a chapter-wise 
outline of this thesis are given in Chapter I. 

Chapter II provides some definitions and elementary results 
needed in subsequent chapters of the present work. 

Chapter III attempts at making a systematic study of the 
optimal classification rules based on linear statistics which 
maximize the Bhattacharyya distance (B-D) in the case when the 
observed process is discrete in time. Both stationary and 
non-stationary cases have been considered. For an arbitrary 
time series, we have to solve iteratively an implicit equation 
in order to get our desired linear discriminant function (LDF); 
we have given a simple method for ascertaining an interval of 
convergence of the iteration process. Linear procedures based 
on the B-D belong to the Anderson-Bahadur admissible class for 
a proper choice of the cut-off point. Some special categories 
of problems where the mean vectors and the covariance matrices 
are of specific kind have also been studied here. We compare 
the performance of our LDF with that of some other LDFs considered 
in the literature. The comparison with the quadratic discriminant 
function due to the Bayes’ criterion is considered when covariance 
matrices are proportional. These comparisons result in the 
conclusion that the distance of our interest is worth-considering. 
A one-to-one correspondence between the two classes of linear 
procedures one due to our criterion of maximizing the B-D 



ix 


and another of minimizing the total probability of misclassif ication 

subject to a linear relationship between the two types of errors 

has been established. Under certain regularity conditions, a 
compact form of the LDF is obtained for a covariance stationary 
time series when the sample size is large. Some illustrative 
examples satisfying the regularity conditions are given and it 
is shown that errors of misclassif ication tend to zero asymptoticall 

Chapter IV deals with the continuous time series. It is 
shown that finding the optimal LDF amounts to solving an integral 
equation of Fredholm type. We are able to obtain a compact form 
of the LDF in the case Vvhen the time series is covariance stationary 
with the observation interval infinite. 

Chapter V is devoted to designing of signals. It is shown 
that even in the simple case when the two types of errors are 
made equal we are unable to obtain an explicit expression for 
the optimum signal. An analytical solution for the optimal 
signal however is available only through a bound on the total 
probability of misclassif ication. 

As complex normal processes are of interest in many 
applied areas ^ in Chapter VI, all the major results of the 
preceeding chapters have been extended to the case where the 
underlying process is complex valued. Here too, some special 
categories of problems of practical interest have been discussed. 

The concluding Chapter VII includes some suggestions for 
further investigation in this interesting field. 


CHAPTER I 


INTRODUCTION 


1.1 PROBLBI 

The problem of classif ication, also known as discrimination 
or identification in statistical literature, arises when an 
observation is to be classified as caning from one of several 
categories or populations characterized by their respective 
probability distributions. In many cases it can reasonably 
be assumed that there is a finite number of populations 
from which the observation could have come. 

The problem of classification may be viewed as a problem 
of statistical decision functions ([ 3 ] , Chapter 6). We have 
a number of hypotheses t each hypothesis is that the distribution 
of the observation is a given one. We must accept one of 
these hypotheses and reject the others. If we are concerned 
with only two populations, we have an elanentary problon 
of testing one simple hypothesis against another simple 


one 



2 


In developing a classification rule, distributional 
assumptions may be I 

1. that the probability distributions of the observation 
under the populations are completely known, 

2. that the functional form of the distributions is 
known but the parameters are unknown, or 

3. that nothing whatever is known about the distributions. 


In this work we consider the classification problem 
in the case of two multivariate normal distributions with 
different mean vectors and covariance matrices. We assume 
all parameters known. 

The two distributions of the random vector x of n 
components are denoted by where 

idl *ii2 mean vectors and and R 2 are the covariance 

matrices of the first and second populations, respectively 
the density of the jth distribution (j = 1,2) is 




(2 It) 




j exp { 
2 


- - 


' J to J ' 


( 1 . 1 . 1 ) 


The theoretically best procedures for classification' 

(or, alternatively, for testing the null hypothesis of 
one distribution against the alternative hypothesis of the 
other distribution) are based on the likelihood ratio 
P2^^)/Pl^^,^ * classifies into the first population if 



3 


this ratio (for a given observation x) is less than a 

(fw 

constant and into the second otherwise. If = R 2 » 
the likelihood ratio depends on a linear function of 
X (called the Fisher’s discriminant function), but if 
Rf R 2 the ratio depends on a quadratic function of x ([2l]). 
In particular, in the univariate case , the logarithm of 
the likelihood ratio is 


In 


1(-L . i_N 

2^2 ' 


X - 


^1 

“1 




)x 


^ 2 

^1 




) 


( 1 . 1 . 2 ) 


2 2 2 2 

where Rj: = Oj^ and ^2 ~ ®2 * ®2 ^ *^1 ’ coefficient 

2 

of X is positive, and the set of x*s lor which (1.1.2) 
is less than a constant is a finite interval. The procedure 
is to classify an observation as coming from the first 
population if it falls in this interval and as from the 
the second if it falls outside (i.e. if the observation is 
sufficiently small or large). In the bivariate case, the 
regions are defined by conic sections j for example, the 
region of classification into one population might be the 
interior of an ellipse or the region between two hyperbolas. 
In general »the regions are defined by means of a quadr - 
atic function of the observations wrfiich is not necessarily 
a positive definite quadratic form. These procedures 
depend very much on the assumption of normality and in 
particular on the shape of the normal distribution relative. 



4 


to its center. For instance, in the univariate case cited 
above, the region of classification into the first population 
is a finite interval because the density of the first 
population falls off in either direction more rapidly than 
the density of the second since its standard deviation is 
smaller. 

In a situation where the two populations are centered 
around different points and have aifferent patterns of 
scatter, and where one considers multivariate normal 
distributions to be reasonably good approximations for 
these two populations, one may want to divide the sample 
space into two regions of classification by some simple 
curve or surface. The simplest one is a line or a hyperplane ; 
the procedure may then be termed linear. The formal definition 
of a linear procedure is given in Chapter III. It is useful 
to consider the linear procedures as the distributional 
problems associated with the quadratic classification function 
are very complicated ([59]) (in actual practice the error rates 
are found through bounds, see ([63]) and even its implementation 
is difficult. We study an optimal classification rule based 
on linear statistics which maximizes the Bhattacharyya distance. 
We define the Bhattacharyya distance in Chapter II, 



5 


One naturally asks I why did we pick up the Bhattacharyya 
distance for our analysis ? The points which motivated us to 
examine the consequence of maximizing the Bhattacharyya distance 
for linear functions are the following I 

1. In the context of control theory, Schweppe ([26]) made 
the following remark I '’readers who subscribe to the theory 
that * the best answer is the simplest answer ' may decide that 
the Bhattacharyya distance is superior to the Kullback-Leibler 
distance ’ ' . 

2. In the study of the problem of signal selection (we define 
it shortly) when the covariances are equal, the Bhattacharyya 
distance has been employed successfully (see [26,48]). 

3. If one uses Bayes criteria for classification and attach 
equal costs to each type of misclassif Icatl on , then it is shown 

by Matusita ([39]) that the total probability of misclassif ication 
is majorized by exp {-B } , where B denotes the Bhattacharyya distance. 

4. In the case of equal covariances, the maximization of 
Bhattacharyya distance yields the Fisher *s discriminant function. 

A problem closely related to the problem of classification 

is the following. In our analysis the observed random vector X 

• ^ 

constitutes a time series ( a time series is a collection of 
observations made sequentially in time). A fairly general 
model ([ 5 ]) of a time series can be written as 



6 


X(t) = M(t) + n(t) 

where M'Ct) is a completely deterministic process and n(t) is 
a stochastic process. They are sanetimes called ’’signal'* 
and ’’ noise'* process respectively. We assume that X is an 

ro 

observation on the stochastic process {X(t),teT } . 


Let 


X(t) = 


'M-Ct) + nj^(t), under 


n2( t) , under H2 


where ^jCt) are normal processes. 


Our object is to minimize the Bayes risk or, if we attach 
eaual cost to the two types of errors, to minimize the total 
probability of misclassif ication. For a given p. , this 
probability will also be a function of P and one naturally asks I 
which is the signal vector that minimizes the total probability 
of error due to the Bayes optimal classification rule subject to 
the condition P = 1 ? This problem is referred to as the 
''design of signal'' ([ 48 ]) . The direct minimization of the 
total probability of error is extremely difficult even 
when % = R2 . We thus adopt s<xne signal selection criterion for 
our purpose. We study the case when ^ R2. 


Besides this interesting area of signal selection, there 
are a number of practical problems which reduce to classifying 



7 


a realization of a noimal stochastic process as belonging 
to one or the other of two categories, e.g. discriminating 
between seismic records originating from earthquakes and 
those originating from nuclear explosions. Applications 
of time series discriminant analysis are not limited to 
the physical sciences. The classification of individuals 
using recorded brain waves is a potentially important 
application in medicine. 

1.2 REVIEW OF PREVIOUS WORK 

The origin of classification problem is fairly old 
and its development reflects the same broad phases as 
those of general statistical inference, viz., a Pearsonian 
stage followed by Fisher, Neyman-Pearsonian and Waldian stages. 

In early work , the classification problem was not precisely 
formulated and often considered as the problem of testing the 
equality of two or more distributions. Various test statistics 
were proposed in order to measure the distance between two 
populations . It was Pearson who first proposed one such 
statistic and termed it as the ’’coefficient of racial 
likeness’ ’ to ascertain the statistical distance between 
two samples (K. Pearson, 1926, [46]). Dissatisfaction with 
earson’s coefficient led P.C. Mahalanobis to propose the 
D^-statistic as an alternative (Mahalanobis, 1936, [34]). 



8 


This was first successfully applied to discrimination problems 
in anthropological studies among others. The D -statistic 
has become widely used because it is an actual measure of 
metric distance between population centroids rather than 
primarily a criterion tor testing the null hypothesis of 
zero distance. Here by population we mean the normal populations 
with eaual covariance matrices. 

The first published accounts of what we now know as 
discriminant analysis in the strictest sense were in 
craniometries by Barnard (1935» [s]) and Martin (1936, [35]). 
These authors had the procedures suggested to then by Sir Ronald 
Fisher, who Is rightfully given the credit for developing 
the discriminant function technique. In 1936 ([l?]) Fisher 
formally proposed the linear discrimirant function as a solution 
to a practical problem of achieving optimal sfeperation of two 
species of plants using a number of dependent variables. The 
motivation for the use of the linear discriminant function 
in multivariate populations came from Fisher’s ov/n idea in 
the univariate case. 

For the univariate case he suggested a rule which 
classifies an observation x into the ith univariate population 

ix-x^l = min { I x-x^l , I X-X 2 I ? , (i = 1,2) 


if 



9 


where is the sample mean based on a sample of size 
n^ from the ith population. For an n— canponent observation 
vector (n > 1), Fisher reduced the problem to the univariate 
one by considering an optimum linear combination of the 
n-ccxnponents . He obtained it by maximizing the ratio of 
the difference of the expected values of a linear com.bination 
under the two populations to its standard deviation. He then 
used his univariate discrimination method with this optimum 
linear combination of components as the randcxn variable. 

The next stage of development of discriminant analysis 
was influenced by Neyman and Pearson’s pioneering fundamental 
work (l9P6,[4l]) in the theory of statistical inference. The 
Fisher’s classification aspect received mathematical validity 
when Welch (1939» [69]) showed that the identification aspect 
of Fisherian discriminant function was essentially an 
application of the Neyman-Pearson likelihood ratio principle. 
Actually he derived the form of Bayes rule for discriminating 
between tv/o knovm multivariate populations vath the same 
covariance matrix \ he illustrated the theory with multivariate 
noimal populations. This example was also taken up by 
Wald (1944, [66]) when the param.eters were unknown ^ replacing 
the unknown parameters by their respective maximum likelihood 
estimates he studied the distribution of his proposed test 
stati stic . 

The problem of classification of an observation into 
two univariate normal populations with different variances 



10 


was studied by Cavalli (l945t [l3]) and Penrose (1947, [47]). 

The multivariate analog was treated by Snith (1947, [61 J). 

He suggested Monte Carlo methods tor computing error rates. 

He gave an example of bivariate normal distributions. 

The rigorous mathem.ati cal treatment of the classification 

problem was put forth in a series of papers by C.R. Rao (l946, 

1947a, b, 1948, 1949a ,b, 1950) ([ 49 _ 55 j), He mainly followed 

the approach of Wald. The importance of these works of Rao 

is obvious. In addition to refining and generalizing the 
2 

D -statistic, he suggested a measure of distance between two 
populations, discussed the problem of doubtful regions where 
definite decisions cannot be made and the generalization of 
the classification problem to three or more groups. Rao^s 
development is for the case when the distributions are all 
known. General theoretical results on the classification 
problan in the frame work of decision theory are given in 
the book by Wald (1950, [67]) and in a paper by Wald and 
Wolfowitz (1950, [68]). 

In what follows we shall be concerned only with the 
significant development in rhe theory of classification problems 
associated with two multivariate noimal populations with unegual 
covariance matrices. Kullback (1952,1959, [30], [3l]) considered 
a rule based on a linear statistic which maximizes the 



II 


Kullback-Leibler distance between two univariate noimal 
distributions of a linear statistic under the two hypotheses. 

He also obtained seme partial results on deriving the optimum 
class of rules based on linear functions of observations from 
Keyman-Pearson view-point (i.e. minimizing one probability of 
mi sclassif ication by controlling the other). Clunies-Ross 
and Riftenburgh (1960, [l6]) studied this problem geometrically. 
Anderson and Bahadur (1962, [4]) derived the minimax rule and 
characterized the minimal complete class after restricting to 
the class of rules based on linear functions of observations. 


The distribution of the quadratic discriminant function 
v;as studied by Okamoto (1963, [42]) for the special case 
^ » by Bartlett and Please (1963, [9]) for the special 

case ^2 ~ h2 ~ ^ 



I 




1 


Matusita (1967, [38]) suggested a minimum distance lule 
(MDR). Let X be the random variable under consideration., and 
the empirical distribution based on n observations on X. 
Suppose that X has one of »^2 distribution. Let 

S* ,S'* be the empirical distributions deteim.ined by observations 



12 


on and F 2 respectively. Then the decision rule he proposed 
is the follovnng I 

(i) when d(Sj^,S]^) < » we decide on F^ , 

Cii) when dCS^,S^) > d(Sj^,Sg’), we decide on F^s 
(iii) For the case d(Sj^,j;^) = ^(5^,5^**), we determine in 
advance to take either of »^2 * ^1* 

Here, d(.,.) is a distance in the space of distributions 
concerned. He took 

1 

c1CFi,F 2) = [ ^ - \/'P2(x))^dni]^ , 

where F^ are defined on q , with p.d.f. p^ with respect to 
a o-finite measure m. Ke studied seperately the cases according 
as the Mj’s and Rj’s are known or unknown , and obtained scane 
bounds on the probability of correct classification (using MDR) 
and on the total probability of misclassif ication resulting 
from the use of Bayes rule. 

When = dR 2 (d > 1), the distributions of the QDF and 
its plug in version (i.e. by replacing the parameters by their 
estimates) were studied by Han (1969, [23]) . Similar results 
were obtained by Han (1970, i.24j ) when the R.'s are of 
"circular" type. The same problem was treated by Gilbert 
(1969, [l9j). The author compared the total probability of 
ni sclassif ication (PWC) resulting from the use of QDF with 


that of the LDFt 



13 


6 * ( iiiR + 1 —( 1 ^ 2 ) ^ JJ, » 

where 6 = ~ prior probability of assuming 

parameters are known. For the latter the optimum cut-off 
point for which the total PMC is minimized was obtained. 

Shumway and Unger (lo74, [59]) considered LIFs 
for stationary time series. They obtained the asymptotic 
optimal LDF corresponding to rhe criterion of maximizing 
Kullback-Leibler distance using certain spectral approximations. 
The optimizing discriminant function was applied to the 
seismic data from selected earthquakes and nuclear explosions. 

The recent book by Seber (1984, [58]) contains interesting 
discussions on discriminant analysis methodology and , in 
addition, includes an extensive bibliography. 

1 .3 SUMMARY 

The thesis is divided into seven chapters. Chapter II 
provides some definitions and elementary results needed in 
subsequent chapters of the present work. 

Chapter III contains a systematic study of optimal classfi- 
cation rules based on linear statistics which maximize the 
Bhattacharyya distance (B-D) in the case when the observed 
process is discrete in time. Both stationary and non-stationary 
cases have been considered. For an arbitrary time series, we 
have to solve iteratively an implicit equation in order to 



14 


get our desired linear discriminant function (LDF); we 
have given a simple method of ascertaining an interval 
of convergence of the iteration process. Linear 
procedures based on the B-D belong to the Anderson-Bahadur 
admissible class tor a proper choice of the cut-off 
point. Some special cateogories of problems where the 
mean vectors and the covariance matrices are of specific 
kind have also been studied here. We compare the pertormance 
of our LDF with that of some other LDFs considered in the 
literature. The comparison with the quadratic discriminant 
function due to the Bayes’ criterion is considered when 
covariance matrices are proportional. These comparisons 
result in the conclusion that the distance or our interest 
is worth-considering, A one-to-one correspondence between 

the two classes of linear procedures one due to our 

criterion of maximizing the B-D and another of minimizing 
the total probability of misclassif ication subject to a 
linear relationship between the two types of errors — — has 
been established. Under certain regularity conditions, 
a compact form of the LDF for a covariance stationary time 
series when the sample size is large is obtained. Some 
illustrative examples satisfying the regularity conditions 
are given and it is shown that errors of misclassif ication 
tend to zero asymptotically. 



15 


Chapter IV deals with the continous time series. It 
is shown that finding the optimal LDF amounts to solving 
an integral equation of Fredholm type. We are able to 
obtain a compact form of the LEJF in the case when the time 
series is covariance stat-j onary with the observation interval 
infinite . 

Chapter V is devoted ro designing of signals. It is 
shown that even in the simple case when the two types of 
errors are made eoual we are unable to obtain an explicit 
expression for the optimum signal. An analytical solution 
for the optimal signal however is available only through a 
bound on the total probability of misclassification. 

Since complex normal processes are of interest in many 
applied areas ^ in Chapter VI all the major results of the 
preceeding chapters have been extended to the case where 
the underlying process is complex valued. Here too, some 
special categories of problems of practical interest have 
been discussed. 

The concluding Chapter VII includes some suggestions 
for further investigation in this interesting field. 

Throughout this work, the underlying process is assumed 
to be real valued, unless stated otherwise (Chapter VI), 



CHAPTER II 


PRELIMINARIES 


2.1 INTRODUCTION 

In this chapter, some definitions ana elementary results are 
collected for later use in the present v;ork. The basic idea of 
statistical distance is given in Section 2.2. We define in 
Sectjon 2,3 the Bhattacharyya distance between two populations 
which is of primary concern to our investigation and for easy 
reference, w'e derive the Bhattacharyya distance between two 
normal populations. This is followed by a rem.ark where it is 
shown how this distance measures the dissimilarity between them. 
The relevant notions of covariance stationarity and spectral 
density of a stochastic process are contained in Section 2.4 
while Section 2.5 contains some basic concepts of complex normal 
processes . 

2.2 STATISTICAL DISTANCE MEASURES 


The concept of distance associated with a metric space is 
well known. If x and y are two “points" in a metric space, 
then d(x,y), called the distance between x and y, satisfies 
the three properties I 


i ) non-negativity 

ii) symmetry 

iii) triangle inequality 


n 


As an example take the metric space IR^ and d(x ,y) =[ Z IXj-y^l*^} , 

i=l '*■ 

These various distance functions can be 


p >1 f f or X ,y e 





17 


viewed as measures of dissimilarity between two ’'points” 
in the metric space IR'^. 

Analogously, several indices have been suggested in 
statistical literature to reflect the degree of dissimilarity 
between any two probability distributions. Such indices have 
been variously called measures of distance between two distributions 
(see [l]» for instance), m.easures of separation ([56]), m.easures 
of discrimiinatory inform:ation ([l5,31j), and measures of 
variation-distance ([28]). While these indices have not all 
been introduced for exactly the sarnie purpose, as the names 
given to them im.ply , they have the comimon property of increasing 
as the tv/o distributions involved ’’move apart’’. An index with 
this property may be called a measure of divergence of one distributi< 
from another. A general method o± generating measures of divergence 
has been discussed in a paper of Ali and Silvey ([2]) and it is 
shown therein that various available measures of divergence belong 
to a general class defined by their method. 

We shall now give some exam.ples of mieasures of divergence. 

Let (q,^ >v) be a measure space and (P be the set of all probability 
measures on (B which are absolutely continuous with respect to . 
Consider two such probability measures *^2 ^ ^ Pi 

p 2 be their respective density functions vath respect to v . Then 

(i) J(l,2) = / (Po-Pi) ^ 

Q ^ 1 P^ 

is known as Jeffrey’s measure of divergence 



18 


Pi 

(ii) 1(1,2) = / p. In dv 

Q ^2 

Po 

and 1(2,1) = J P 2 In ^ dv 

are called Kullback-Leibler *s measures of discriminatory 
information ([3l]). 

(iii) Kolmogorov’s measure of variational distance ([28j) 
is given by 

4 / Is/’p^ - i/’p, 1 dv 
D ^ 

(iv) K'atusita’s measure of disrance ([36j) is as follows 
I (^p^ - /p, )2 dv 

(v) Chernoff *s ([is]) measure of discriminatory information 
is defined by 

- in inf ( j p!^ p^“^ d^^ ) 

0<t< 1 Q ^ ^ 

Remark 2.2.1 « Statistical distance functions need not satisfy all 
the properties of a distance function stated at the beginning of 
Section 2.2. We note in Section 2.3 that the Bhattacharyya 
distance m*ay not satisfy the triangle inequality. 

2.3 BHATTACHARYYA DISTANCE 

In this work we shall be concerned with the Bhattacharyya 
distance which was first introduced in a statistical context 
by Bhattacharyya ([ll]). An early and well known statistical 



19 


application of this measure was made by Kakutani ([27]) who 
himself mentions an earlier appearance of this measure in a 
non-statistical problem (see Hellinger [25]). Therefore, 
the names of Hellinger and Kakutani are also often associated 
v/ith the Bhattacharyya distance. 


We first define the Bhattacharyya coefficient for the 
densities Pj^ (i = 1,2) by 


^ 2^^1 ’'° 2 ^ - ^ 


(2.3.1) 


More generally, 

A dP, dP 

P^(P, ,P^) = / ( 

^ ^ D 

dP^ 


di^ dv ^ ° 


where is the P.adon-Nikodym derivative of Pj^(i = 1»2) with 

respect to . Pg^^l ’^2^ called the Bhattacharyya coefficient 
or affinity between the two probability measures Pj^ and p2* Note 
that p2^^1’^2^ does not depend on the measure v dominating P^^ and 
^2* *^r^^l ’ * * * '^r ' ’ affinity among r d.f.s. Pj^,...,P^ , can be 

defined (see [3?]) analogously. 

The following proposition states some properties of P 2 (Pj^»P 2 ) 
which can be proved easily (see, for example [39]). 

PROPOSITION 1 : 

i) 0 < P2 (Pi,P 2) i 1 

ii) P2^^1*^2^ = 1 if and only if 

iii) P 2 (Pj^,p 2 ) = 0 if and only if P^ 1 P 2 . 



20 


The Bhattacharyya distance between two probability 
distributions and P2 is defined by 
„ A 

B = the Bhattacharyya distance 
= - In (2.3.2) 

Clearly, 0 < B < co , 

The Bhattacharyya distance need not satisfy the triangle 
ineouality (see Appendix A). 

We shall nov^f derive the Bhattacharyya distance between 
two multivariate normal populations. 


PROPOSITION 2 : Let]~]^ and I ^ be two n— variate nonnal 

populations vdth distributions N(p^,R^) and N(jU 2»B2) respectively, 
where Rj^ and R2 are positive definite matrices. Then 


In Po = 


4 P+ 4 


8 


(2.3o3) 


where 


P = 2 In IRI - In iRj^f - In I R2I , 


and 


R = (R^+R2)/2. 


Proof I Let the random vector X have rne densities under 

^ r-j 

(j = 1,2) respectively. Then by definition, 

' 1 
P2(1,2;x) =^/ (Pi(x)p2(x))^ dx 

(for convenience, we write P2(l,2tx) in place of P2(P2»P2’^)) 




21 


1 

= [(2n)2" lR^llR 2 l] ^ J exp[ - i ;(x-iJ^)'Rjl(x-^^) 

IR^ 

(2.3.4) 

Using Appendix B , we can write 

= (x-™)' S(x-m) + 

v/here m = (R^+R2)~^ 
and S = RJ^(Ri+H 2)R2^. 

These relations when used in (2.3.4) give 

i 

(IR, MR^l)^ , , 

P2(1,2;x) = 2_ expi - ^ 

i|(R^+R2)1^ (2.3.5) 

Hence the result. 


Remark 2.3.1 I Futtina n = 1 , we have 

1 

, (7??7l?)^ . U -,- m ..)2 

Pr.il ,2;x) = — exp { - 2 — ^ ^ 

1 -, 2^_2 


( 2 . 3 . 6 ) 


{^(Tlf+Tll )}^ 


^1^2 


/ o o 

where the involved distributions are N(mj^,T}£) and N(m2»n2) 


Remark 2.3.2 \ (l) The Bhattacharyya distance comes out as a 
special case of the Chernoff distance taking “t- = 2 * 

(2) We note that an n— variate normal distribution 
is completely specified by the set of — parameters. 


\ V 


V 


n 


ll 


nn 




12 


f * • 0 f 


^(n-l)n^ 



22 


of which the first n are the location parameters and the 
remaining are the orientation parameters. Naturally, the 
dissimdlarity between the populations can be judged through 
the disagreemient between the corresponaing location and 
orientation parameters of the two populations. Now, an 
examination of the expression (2.3.3) reveals that the 
first term measures the dissimilarity of the two populations 
with respect to their orientations, while the second term 

does so in terms of their locations. In other words, P measures 

9 

the divergence in the dispersion matrices and D between the 
m.ean values, and the total divergence is a weighted sum of the 
two . 

2.4 STOCHASTIC PROCESSES *. SOME DEFINITIONS AND RESULTS 

Let {X(t), teT} be a stochastic process defined on 

a probability space (q.cT , P) where the index set T may be 

discrete or continuous. In discrete case, T may be one 

of the forms Co, +1, + 2,...} and {o,l,2,...3 and in 

the continuous case T may be Ct!t >0} orCtt -oo< t <«»} . 

A 

The mean function is defined by M'Ct) = EX(t) , teT. The function 
K(t^,t2) = Cov(X(t^),X(t2)) = E(XCtj^)- u(t^))(X(t2)-u (t2)), 
for ^ called the covariance kernel of the process. 

A stochastic process is said to be covariance stationary ([44]] 
it possesses finite second moments and if its covariance kernel 



23 


K(t,s) is a function only of the absolute difference I s-tl , 
in the sense that there exists a function R(v) such that for 
all s and t in T, 

K(t ,s) = R(l s-tl ) , 

or more precisely, R(v) has the property that for every t 
and V e T, 

Cov(X(t),X(t+v)) = R(v) (2.4.1) 

We call R(v) the covariance function of the covariance 
stationary stochastic process {X(t), tsT } , 

A stochastic process vath discrete time param.eter which 
is covariance stationary defines a sequence of covariances, 
say, rCo) ,r(l) ,. .. . The Fourier transform of the sequence 
is defined by (see [ S] ) , 

f(X) = L r (n) e^^’^ , - n < K < n (2.4.2) 

n= oo 
oo 

provided Z ir(n)l < “ , which will ensure that the series 
n=- 

(2.4.2) converges uniformly and absolutely to the function 

f(X) (see [is]). The function f(X) is called the spectral density 

OO 

associated with { r(n)}j^_j, or of the process {X(t), teT } . 

From (2.4.2), one can obtain, 

r(n) = ^ f (X) dX (2.4.3) 

- n 

For continuous time-parameter covariance stationary process 
the spectral density is analogously defined as 

f(X) = rr(h) e^^h 


dh 


(2.4.4) 



24 


OO 

provided / lr(v)ldv < <», and then r(v) is given by 

oc 


Since r(n) 
as 


and 


r(v) 

r(-n) 


oo 

= ^ J f (X) dA (2.4.5) 

— oo 

n , (2.4.2) and (2.4.3) can be rewritten 


f (X) 


r(n) 


OO 

= I r(n) cos Xn 
n=- » 



(2.4o6) 

(2.4.7) 


2.5 COMPLEX NORMAL PROCESSES 

A complex stochastic process CZ(t), tsT } is said to 
be a complex normal process if the real vector (Re Im Z)* 
of 2n dimension has a 2n-variate multivariate normal distribution 
v/ith mean (Re , Imu)’ (2.5.1) 


and covaria nee matrix 



r 


• 


j Re R 

1 

M 

B 

R 

1 

j 



2 

j Im R 

Re 

R 


L 


where ^ ( Z ( tj^ ) , . . . , 

Z(tn))'nxl 




Ez 

R = ii)' 

for any choice of distinct points t^,t 2 »...» t^ in T and for 
any n. R is assumed to be a positive definite Hermitian matrix. 

2 denotes the complex conjugate of z . 

ro 

A complex random vector z = x ^ i ^is said to have 
n-dimensional complex multivariate normal distribution vdth mean ju. 



25 


and covariance matrix R if (x,y)*of 2n dimension is distributed 
as a multivariate normal vector with mean vector and covariance 
m.atrix as specified in (2,5.1) and (2,5.2) respectively. 

The density function of ^ is given by 

p(z) = exp I- (I- g) * ^ (2.5.3) 

which is a real-valued scalar function of the complex vector z. 

oo oo 

VVe note p(z) > 0 and / / p(z) dx dy = 1. 

^ oo _ oo 

One can easily show I 

E(z-u)(z-b)’ = 0 (2.5.4) 

We note that if ^ = x + i has n- dimensional complex 
m.ulti variate normal distribution wdth mean vector ju. and positive 
definite covariance matrix R , then the marginal density of x 

rsj 

is n-dimensional normal with mean Re M and covariance matrix 
^ Re R. Similarly, y has a n-variate normal distribution with 
mean Im and the same covariance miatrix ^ Re R. 

In the follov/ing we define the spectral density of a covarianc 
stationary time series I 

?^(X) = £ o(h) e”^^^ is called the spectral density of 

h = -«» 

of the time series {Z(t), teT } . 

o(h) = Cov(Z(t), Z(t+h)) , - m £ X £ m, 

OO 

provided £ lo(h)t <«»♦ 

— OO 

All the material presented in this section can be found 
in Miller ([40] ) . 



CHAPTER III 


LINEAR DISCRIMINANT FUNCTIONS FOR DISCRETE TIME SERIES 
3 . 1 INTRODUCTION 

In this chapter, we consider observations which constitute 
a discrete time series. The case of continuous time series is 
taken up in the next chapter. In Section 3.2, the problem of 
two-group classification is mathematically formulated as one 
of finding an optimal classification rule based on linear 
statistics which maximize the Bhattacharyya distance associated 
with the groups involved. We obtain in Section 3.3 an expression 
for the optimal linear discriminant function (LDF) in the sense 
of maximizing the Bhattacharyya distance and show that the 
resulting linear procedure belongs to the Anderson-Bahadur 
admissible class for a proper choice of the cut-off point. In 
the same section, a simple method of ascertaining the interval 
of convergence of the iteration process involved in finding 
the LDF is discussed. Some special categories of problems 
are also studied here. In this section we compare the 
performance of our LDF with that of some other LDFs considered 
in the literature. The comparison with the quadratic discriminant 
function due to the Bayes' criterion is presented in Section 3.4 
when covariance matrices are proportional. A one-to-one 
correspondence between the two classes of linear procedures 



one due to our criterion of maximizing the Bhattacharyya 
distance and another of minimizing the total probability 
of misclassif ication subject to a linear relationship 
between the two types of errors — — has been established 
in Section 6.5. Section 6.6 provides , under certain 
regularity conditions, a compact form of the LDF for 
covariance stationary time series when the sample size 
is large. In the same section some illustrative examples 
are given and it is shown that errors of misclassif ication 
tend to zero asymptotically. 

3.2 MATHEMATICAL FORMULATION OF THE PROBLEM 

The problem is to classify a normal time series 
X = (X(o),..., X{n“l)) as belonging to one of the two 
populations cfe scribed by the two hypotheses Hj^ and H 2 . 

These hypotheses specify that the nxl normal time series 
X has means and covariances ^2»^1 ^ 2*^2 ^1 

H 2 respectively. 

A typical problem of this kind arises in communication 
engineering. We shall consider this example in Chapter V 
from the view-point of ’’signal design' '. 

The admissible procedure for classification provided 
by the Neyman-Pearson theory as well as the Bayes’ rule are 
based on the likelihood ratio. If Rj^=R 2 , the likelihood 
ratio depends on a linear function of X (called the Fisher’s 



28 


discriminant function) ^ this case has been extensively 
studied in the literature C[3]). If R^4=R2» Bayes’ 
rule states I 


assign x to Hi (or H^) according as 



1 R 2 I 






1 

2 


ill 


:-2 





2 




(3.2.1) 

k = ln(a)2 C(l/2)/w^ 0(2/1)) 

where Wj is the prior probability of Hj and C(j/i) is 
the cost of misclassif ying an x to H. when x actually 
belongs to (i+:j,i,j = 1,2). (Note that C(l/l) = C(2/2) = 0). 
The quantity x* (R7 ^--Ro^)x - 2x’ (RT^JJ- i -R^V o) is called 
the quadratic discriminant function (QDF), and in the case 
of unequal covariance matrices, one has to use a QDF since 
(R7^“R2^) does not vanish. Unfortunately, the distribution 
theory pertaining to the quadratic part involves weighted 
sum of non-central chi-square random variables ([59]) so 
that computing error rates resulting from its use seems 
difficult. Hence the usual approach has been ([4,31,59]) 
to consider a linear procedure in other words, in a situation 
where the two populations are centered around different points 
and have different patterns of scatter, and v^ere one considers 
multivariate normal distributions to be reasonably good 
approximations for those two populations, one may want to 



29 


divide the sample space into two regions of classification 
by some simple curve or surface. The simplest is a line 
or hyperplane ] the procedure may then be termed linear. 

We now define linear procedures formally. Let be 
a nxl vector and c be a scalar. An observation x is classified 
as coming from the population under H, if a’x > c and from 

i>a ro 

H 2 otherwise. Briefly, we write I 

Hi 

a’ X ^ c (3.2.2) 

”2 

(accept) 

Thus the problem is to firxi a and c in some optimal 
way. We assume f ^2 ^2 ^Possibly unequal) 

to be positive definite covariance matrices. The parameters 
are assumed to be known. Under these conditions, we study 
the classification rule based on linear statistics which is 
optimal with respect to our criterion of maximizing the 
Bhattacharyya distance. We call a’x the ''linear discriminant 

rs> fs> 

function ' ' ([4] ) . 


3.3 METf©D FOR OBTAINING a IN THE DISCRIMINANT FUNCTION 

Under Hj_ and H 2 , the parameters of the normal distributio 
of the linear discriminant function y = a' x are : 




Varu (y) = a*R,a 


Varu (y) 

2 

(3. 3-1 ) 


a'R 


f 



30 


It follows from (2.2.3) that 

Inp2(lt2;y) = ^[inCa’R^^a) + ln(a'R2a)] - iln|a ' (Rj^+R2)a 


(a *6)' 


a*(Ri+Ro)a 

where 6 = - ^ 2 ' 

Differentiating ([57]) (3.3.2) with respect to a , we get 


(3.3.2) 


dlnP 


2 lr^^l“ ^^2~ 1 1 ^2^^1‘^^2^S 

~ 4 L + J ~ o 


Set 


= > 


a*R-,a a'R^a 


5 “ a * (R,-hR^)a 
^ X z 


-i [ 


2(a'6)6 la'(R,+R,)(i !- (a ’ .2 (R, +R, )a 


= f t 


R, a 


R^a 

Z ro 


a’R,a 


a 'R„a 

Zro 


■] - 


{ a' (R,+R^)ar 

(Rj^+RjV 

a ' (R, +R^ )a 


-¥ 


(a'5)6i a’ (RT+Ro)a } - (a '6 )^ { (R, +Ro )a 1 

] 

{ a* (R,+R^)a}^ 


dlnp2 

Jo 

(a*6)6 


= o 


(a*5)^(R,+R^)a R,a R^a 

«w X ^ X Z*o 


2 (R 1 +R 2 )2 


a* (R,+R^)a { a ’ (R. +R^ )al^ a^R,a a’R^a a’(R, +R^)a- 

fs» X.Z#ik> ««a X Z 0S9 ro Xjpo ««■> Zjm fU X ^ #a 


= [- 




{a * (R-j -fR^ )a} a^R,a a*(R.-fR^)a 

X Z X<rs» X z ^ 


•]R,a 

X#I0 



31 


(a*6)' 


+ [ 


f £' ~“'V. 


] R,“ 


=> (tj^R2 * ^2^2^® ~ <£ ’ where »t2 axe given by 

(a'S)^ 1 2 

**o ro X ^ 


6 


t, = [- 


-]/[■ 


r<^ ro 


•1 


^ a<R^a a'(R.+P^')a a«(R,+R^)a 

X -«C ^ c>o Xco rN> X Z *<» X 

(a’6)2 a’6 

t = [ ^ + ] 

Ca*(Ri+R2)a}2 a*R2a aj (R^+R2)a 

=> ^ = ’ provided (tj^R^+t2R2) is non-singular. 

■1. 


Therefore a 


+ ^r, ^ f 

T, 9 ro 


where 

and 


Rq - R^ — 9 R 2 


A t 
-9= t 


(3.3.3) 

(3.3.4) 

(3.3.5) 


That is, the value of a for which -lnP„(l,2Jy) is a maximum 
satisfies (3=3. 3). 

Remark 3.3 d t -lnP2(l »2 ja’ x) is Invariant under scalar 
multiplication of a . It follows immediately, if we write 

CO 

(3,3.2) in the following form * 

t(a'R,a)(a' R.a)!^/'' 

lnp2(l,2;a» x) = In ^ 

(3.3.2)' 



32 


Remark 3.3.2 I If is a solution of (3.3.3), so is 
= ka , k being any non-zero scalar. It follows 
trivially from (3.3.3). 


Remark 3.3.3 I Following the above remarks, we can say 
that the maximization of -lnP 2 (l ,2^a’ x) is irrespective 
of the value of l/t^ attached as a factor to • Hence 


a = 


^ R IT — •1— R 


_ Rrt‘^6 and a = -t— Rft‘^6 , where x is any non-zero scalar, 

fs> T C7 0 " 

gives the same optimal solutions. Thus the problem of 
determining and t^ reduces to finding of the ratio 

t 2 /tj^. We can assign any value to x j this would not affect 
the desired maximization process. We take x = 1. 

Consequently, the required optimal solutions are 
necessarily of the form I 


a» 5 


a = R-^6 

O <S> 




(3.3.6) 


where - 9 = 


a' (Ri^-RpK 


a’5 




a’(Ri+Ro)cc a’R,a a’CR,tR„)a 

It is clear that (3.3.6) is an implicit ecruation in a . 
Hence an iterative procedure must be employed to solve for a. 

3.3.1 AN EXAMPLE 

Initial or entering values of a are required to begin 
an iterative procedure. The entering value for a is taken as 

«V} 

= (Rl + Rg)"^ « 


a 

ro 



33 


lAith determined, values for a 6,a ,a 

are found and then . Cycle 1 is begun by entering with 

to find a new a from 


©C ® ^ 

(!')• fil’ Ml Ml' Til 
and then determining ■' 5, ’ R, ^ Ro® ^ ''and 

then 9^^^, thus completing the first cycle. This procedure 

is continued until the difference in successive 9's is 

as small as desired. 


We shall illustrate the procedure described with data 
given in Kullback (Chapter 13, [3l]). 

E xample 3.3.1 


X ~ 


X - N. 


f/20.80\ r 6,92 

-5.27 



' ‘ 1 ‘ 

\1 i 

i 

f 

1 under 


\U2.32y \-5.27 

40.89 Jj 



//12.80\ '36.75 

ii \ : 

13.92^ 

under 

1 

Hj. 


1 


Then , = 


^^.^36.40; \13.92 

/ 8.00 \ 


K-24.08 

\ 


287 


,92// 


We have programmed the procedure (see Appendix C). The 
result correct up to 4 decimal places is as follows (number of 
iterations taken is 3) ; 


a 


= (1,-.4153)’ 



34 


Thus the linear discriminant function is * y = -0,4153 

The value of 0 for which we get a is given by 0 = - ,4152. 

3,3.2 CONVERGENCE OF THE ITEEATION PROCESS 

Since Rj^ and R2 are positive definite matrices by 
assumption, there always exists a non-singular matrix P 
such that 

R^ = P’P 

and Rp = P'A P = P« X ^ , 

i o ^ 

where *s (i = 1 ,n) are the characteristic roots of 
(see [,57]). 

Then a'R, a = a' P' Pa = P ’p , say, where Pa = p , 
d Rott = cc’ P'APa = P'A P 

rw Z <s» «s> tC 

a ’6 = p' (P’ )”^6 = P 'h » say , vidiere 6 = P* n 

rs> CNj c>j rv: (TO 

Now p = Pa = P(R,-9Ro)“^ 6 

fNJ X ^ ^ 

= P[P’ P- 9P»AP]“^6 
= P[P‘ (I- ©A )P]“^6 

rsj 

= (I- 0A )"Hp’ 

= (I- 0A 

CO 

i,e. p = (I-0A )“^T7 . 

Thus (3,3,7) reduces to 



35 




; fe'S ,2 ^ _1_ 

£'(A+I)£ g-£ 


i ’(''+!)£ 


S' ('>+1)3 


|'(n+i)j g'Ag 


0 (I' +1)0 B’ g 


£ r 

r ("*!)£ 

fe'S ,2 

B' («-I)P 


<sj ro ro 




(3.3.8) 


(where 


A 2 P't) ^ 

P'(A+I)p p* jj 

ro CS3 ro 


= - > provided A 5 ^ 0 ^ 

the case corresponding to A = o can be treated separately. 


Thus 


A+A9 - 


- 0 = 


fi’Ap 


P'Ap 


- A(i+e)p'p 



36 


or , 9 


T aking 

( 9 ) ^ 


Hi. 

?( 9 ) 


+ (1+0) ( 


- (■ 


6'(A+I)p p' (A+I)p 




(3.3.9) 


the fieri vative of f(0) with respect to 0, we havef 

,2 


n 2A. T)1 
id ^ ^ 


(P'np)2 1=1 (1-9X ) 

* JL 


n X?T)? 

■-,)(P'/'P) - (P’3) 2 £ _X-1_ } 

3^1;^ (1-0X^)3 


+ {■ 


(- 


£’2 .2 


P’(A+I)p ^'(^*+1)3 


r ] (3’3) 


+ 2(1+0) ( £ 


n X . T)f 2 


A 1. 


i) ^ 




i=l (1-0X.)'^' e'(A4‘I)p P' (A+I)P 

i #o ro ♦o #**; 

( £ 


2 n 2(l+Xj^)X^T)^ 


+ (l+9)(fi'fi)[- o( 2 — - --^ 

{P' (A+I)p}^ i=l (1-SX.) 


3 ^ 






_ {(p« Ca-i»i)p)( z ^ - , ) 

{p‘(A+I)p}3 ~ ~ i=l (1-0X.)2 


n X^(H-X.)r,r 


n 


-(p’ t7)2 £ 1 ] 

~ " i=l (l-0Xj^)'^ 

.2 


n ’^i 


- 2 


i=l (l-QXj.) 

n X.T)? 
£ 


^ i=l (1-0X. )^ n xJt)? 


+ 2 


( S 




n 


{ £ 


i=l (1-ex^)^ 'i=l (1-9X^)2 


^vf 2 i=i (i-e?^i) 


3 ^ 



37 


n 
( E 


n: 


n 


•) ( £ 


^2 


+ 2 


i=l (1-SX^)2 1=1 (1-8X^) i=l (1-9 >^) 


n 

i=l 


1=1 (l-0Xj)^ 


n X . T)t n 

£ ± 4..,— £ 

1=1 (l-ex. n X.77? 1=J 

+ 4 (i+e) i — - 2Ci+e)( 2 — ^ — ^)( 

n i=l (l-eXi)S „ 

T' 4 4- « 


t}- 


n (1+XO^ 

2 i — . 

1=1 (1-ex^)- 


1=1 1-ex. 

1_ — )■ 

1=1 (i-ex^)2 


n 

2 


T?. 


-4(i-^e) 


1=1 (i-ex. n X. (i+x. )nf 

- ^.— f rn . 4 - s 4 4 


n ( 1 +X.)d 2 

r- __± i.\-^ 


( £ , 

1=1(1_SX^) 


( 

2 1=1 (i-eXj) 

2^ 




■) 


n 

2 


4 


1=1 ( 1 -ex. n 
_ 2(1+0) ± ( z 


h? i. 


n 


) ( S 




{ 2 


n ( 1 +X ,)„2 i=l ‘=1 

2 


i=l (l-8Xj)' 


n 

( £ 




)' 


+4(i+-e)- 


1=1 Ci-0X^) 
n (1+X.)T)? 


n 
( S 


75^ 

1 


n X.( 1 +X.)t )2 

) ( £ — hr^) 


C 2 

1=1 (1-9 Xj^)' 


1=1 Ci-9X^)'^ 1=1 (1-ex^) 



38 


One can easily (plot 's'{9) against © and) check on 
which interval(s) of the real line the following condition 
is satisfied ! 


If '(9)1 < k < 1, 0 .3.10) 

or, eauivalently see where f ' (9) crosses the f ’(9) = + 1 
lines. If on some interval [a,b], (3.3.10) holds, then for 
any point ©^ of this interval the sequence of points 
©o » © 1 » . • • » ©j^ » . . where (©^ , converges to the 

root of the equation 9= ?(©) in [a,b] (see [33,65]). 

RomaT-lr 3 .3 .4 I Noting the expression lor f(©) in (3o»9), 
it is clear that once we find a P which simultaneously 
diagonalizes and R 2 ♦ the inversion of matrices can be 
avoided to carry out the iteration under consideration. 


An Illustrative Example 


Example 3. ,3. 2 


Take R^ 



and d = I 


L 


0.20'^ 

1.16 


We first find P which simultaneously diagonalizes 
Rj^ and R 2 applying the transformation m = P . The 
method of finding P is described in Remark 5.3.1, Chapter V. 
We have , 




39 


D 


1 

2 


'^172 


0 


Thus , EH 
required P = 


E R^E 
.41 


r .64 
,64 

.64 


^78 

0 

1 .87 

.79 
-.79 J 


and conseouently the 


C .64 -.79^ - ^.64 -.64) A = ( 


•79 .79, „ _ ,0.41 


^ ) 
1.87-' 


Now. „ = (p')-h = C;—) . 

We note A ^ 0 , A is defined in (5.3,8). 


,0.8704> 


Hence we are in a position to compute ?’(0). 

We see that in [-.9,-8] , satisfies, 0 <?’(©) < 1- 

The graph of t’(®) is plotted in Fig, 3.1. 

Thus 'Sid) is a contraction mapping in [-9,-8]. We now verify 
this result by directly carrying out the iteration involved 
in (3.3.6), The following table shows this verification. 




Graph of ^'( 6 ) 



41 


Table 3.1 

Verification in Example 3.3.2 


Initial Value 

Final value 

li4uiaDer of iteration 

®o 


taken 

-9.0 

- 8.9095 

4 

-8.5 

-8.9093 

5 

0 

(DO 

1 

-8.9093 

1 


3.3.3 SOME SPECIAL CATEGORIES OF PROBLEMS 


We first consider the following problem where the 
difference of the mean vectors of x under and H 2 is the Bull vectoi 

( 1 ) As 6 = 0, 


1 

2f 


p2Cl,2;y) = ^ ^-3- 

Since there always exists a non-singular matrix P such that 


(5.5.11) 


and 


R3_ = P*P 
R 2 = P‘AP 


where A is diagonal with elements as the eigen values of 
matrix R 2 Rj”^ , we can rewrite (5.5.11) as 


p2(l »2;y) 


(3.3.12) 


( 1 . 

L':g 



42 


where Pa = J . 

Then we have the following theorem. 



Theorem 3.3, i ; The optimal S is the eigen vector 
corresponding to ^niin^^2'^I^ ^ ^inax^^2^1^^ according as 

\ln(R2'*rh ^ 1 

The optimal a is obtained by solving (3,3, 13). 


Proof 


We see that (3.3.12) is of the form 


1 



(l+x)^ 


where xe[a,b], and let a < 1 < b (justification of which 
will be seen soon). 


Then 


^ ^ 1 

dx 4 


Thus y is increasing in 


1-x 

C1+x)3/2.xV4 

[a,l] and decreasing in [l»b]. 


Now, we know that 

S ^max^") 

E I 

where the equality at the left occurs when J is the e.v. 
corresponding to the right when ^ is 

the e.v. corresponding to the 


3.13) 


.3.14) 



43 


Thus , 


P2a.2;y)| 

p 


= mini 




(1+X . (a)) 
' min'‘ • ' 




1 

T 

5 


(3 .3 .15) 


Finally (3.3.14) follows easily if we observe that 








W")) 


<=> 






, where ^ Nnax^") = ‘'l 


9 < 9 

<=> a, + a, bf > b, + b-i af 
111 1 11 


<— > ®1 ^ ^ ^^l””^l^ 


^ ^ ^ 1^1 ^ ^ * 


Hence the theorem. 


Next we consider the following problem. 

(2) Let ^ = 0 

and ^2 ~ ^2 some 0 (3.3.16) 

This model occurs naturally in radar problems which we shall 
consider in Chapter VI . In this case the optimal a in the 



44 


sense of maximizing the Bhattacharyya distance is given 
by the following theorem. 

Theorem 3 .3 . 2 I If the difference of the mean vectors 
of X under and H 2 is null and Rj^ and R 2 are related 
by (3.3.16), then the optimum a is given by 

a = . 

^ #0 

Proof I It follows immediately once we recognize 


1 1 

1 

{ a/(2R2+^^/i' )a}2 


as of the form 



which decreases as x increases for x > 0 and thus our 
problem reduces to : 

max (a*p.) /(a*R^a), 

fo-' ' rc 

a 

Example 3 .3 ,3 . Take ^ 2 ^ ^^0^1/2^ * ’^“2 — 


(3.3.17) 



45 


Thus , 



Since the optimal a is R 


1 / 2 , 

1 



1 

(1 + 

^ = 0.92 

Cl + 2 

and from (3 .3.15) , 

_ J ^ 

= min { 0 ,92 , 1 .0} 

= .92. 

Thus two results agree as expected. 

3) Let 6 = 0 

fO 

and = R2 + R3 > where R^ in a p.d. matrix . (3.3.18) 

In Chapter VI, we shall describe a model which gives rise 
to the above relation between R^^ and R2. In this case 
the following theorem states what the optimum ^ is. 

Theorem 3.3 .3 1 Let the difference of the mean vectors of 
X under and H2 be null and Rj^ and R2 be related according 
to (3.3.18). Then the required optimal a is the eigen vector 

«s> 

corresponding to the maximum eigen value of R^^R^. 



92(1,2 *, 7 ) 





46 


Proof : By virtue of (3.3.18), 


p2Cl,2;y) = (1 


a *R-a 

ro ^ ro 


4 , a*R-a i 


)V (1 ^ ^ 


Now, (3.3.19) is of the form 

1 


y = 


(l+x)^ 


(i+ix) 


which decreases as x increases for x > 0. 
Thus our problem reduces to find 

max 

S a’R^a 

Hence the theorem. 


^ a’R^a 


(3.3.19) 


(3.3.20) 


3.3.4 OTHER LINEAR DISCRIMINANT FUNCTIONS 

Various other distances have been considered by 
Kullback in the context of linear discriminant functions ([31]) 
The following expressions are ofiJaterest I 


l(l,2;y) 


, a'R^a 

1 in 

^ a *R, a 

CO X ^ 


■ 1^1 g ’V ^ 1 

^ ^ a*R^a ^ a'R^a 

40 .4U>S» 


(3.3.21) 



47 


1 1 1 0? R„a , (cc*6) 

1(2, i;y) = ^ in ^ i - - 2- -t- 4 


a’Rott 

CO 


a ‘R, a 

«»o X CO 


a* R, a 

po X 


( 3 - 3 . 22 ) 


, a'Roa 
J(1 ,2;y) = -~22i 


a' R, a 

1 -^ 


, a'R.a , , 

+ I -^— I,^ - 1 + J-( X 


a* R^a 

cj ^ #o 


a'Rja 


a 'R^a 

ro -Z CO 


■)(a’6) 


( 3 . 3 ^- 3 ) 

The value of a for which I(l,2;y)in (3.3.21) is a maximum 
satisfies (by the usual calculus procedures) an equation of 
the same form as (3.3.6) but with 


-9 = 


a* R, a 
o' R^a 

CO Z CO 


(1 ~ 


(a *6)' 


a* R^a-a'R , a 

CO ^CO CO X CO 


-) 


(3.3.24) 


The value of a for which I(2,l^y) in (3.3.22) is a 
maximum satisfies (by the usual calculus procedures) an 
equation of the same form as (3*3-6) but with 


a' R-, a(a*Ri a~a* RoO^) 

CO X co^co X Z CO 

a’Rr)a(o^ R, a-a^R^a-(a’ 6 )^) 

^ Zl'io CO Xco CO Z CO CO CO 


(3-3-25) 


The value of a for v;hich J(l,2^y) in ( 3 - 3 - 23 ) is a 

CO 

maximum satisfies (by the usual calculus procedures) an 
equation of the same form as (5 -3 -6) but with 


2* Vi £)^Coc*R2a ) ) 


( 3 - 3 - 26 ) 



Another important class of linear discriminant functions 
arises in the following way. It is clear that associated 
with the classification scheme 

>1 

a'x < c 

(accept) 

are two kinds of errors. The probability of misclassif ying 
an observation when it comes from the population under 
is given by 


a* x-a' 




(a’R,a)-^/^ (a’R,a)^/^ 

tfo J. ca X 


= #(■ 


c-a*M, 


) = 1- f (■ 


a‘'jLt,-c 


-) 


(a’R,a)^/2 (a’R,a)^/^ 

(3.3.27) 

and the probability of misclassif ying an observation 
when it comes from the population under H 2 is given by 

e^ = Pr^(o? X > c) = 1 - Pr^Ca’x < c) 

1~ »(— ^/j) 




(3.3.28) 


-c 

Define y-, = ~ . n /"o arid y2 - sl/2 (3.3.29) 

We can then form a minimum error criterion for finding 
a linear discriminant function, namely, for a given e^^ , what 



49 


linear function of the x’s will minimize Since ej_ and 

62 93 re monotone functions of and respectively, it is 
simpler to work with the latter. 
y 2 can be rewritten as 

a'6-y, (a»R,a)^/^ 

y = (3.3.30) 

Thus for a given ej^ , ^2 will be minimized by maximizing 
(3 .3 • 30) . The usual calculus procedures lead to the 
ecu at ion 

a = (R, - 9Ro)‘‘^6 
y a* R, a , 

where -9= (^) (“^^ — ^ (3.3.31) 

U a' R^a 

(see Kullback([ 3 i ] ) ) . 

Remark 3 . 3 . 3 \ We note that the linear discriminant 
functions derived from the minimum error criterion were 
extensively studied by Anderson and Bahadur ([4j). 

Remark 3.3.6 \ If © found by maximizing -In P 2 (l, 2 ’,y) 
makes R^ positive definite and the threshold of the test 

O 

( 3 . 2 . 2 ) c is chosen according to the followng relation 

c = f 

then our procedure is admissible (within the class of linear 
procedures). This follows from the Anderson-Bahadur *s theorem 
on admissible class of linear procedures (see Rgfflgr ] c ., 3 _ .6 . , 3. 



where we describe it in some detail). All other procedures 
considered above also become admissible followdng the same 
approach and that the A— B procedure is admissible is proven 
in ([4]). 

3.3.5 COMPARISON OF THE VARIOUS LINEAR DISCRIMINANT FUNCTIONS 

First we consider the example of Section 3,3,1. 

Having done this we discuss examples based on autoregressive 
processes. 

(a) The LDFs listed below are obtained in([3l]) 
for the example given in Section 3.3,1 I 

(number of iteration taken is 3 ) 


max I(l ,2^y ) 1 

y = - 0 . 3924 X 2 

(3.3.32) 

max 1(2, l^y) ! 

y = Xj^ - 0 . 8491 x 2 

(3.3.33) 

max J(l ,2ly ) 1 

y = Xj^— 0 . 6295 X 2 

(3.3.34) 


max Y 2 (given = 1.645 or equivalently e^^ = 0.05)1 

y = Xj^ -0.4173X2 (3.3.35) 

max y 2 (given y 2 = 1.0 or eouivalently e^^ = 0.16): 

y = -0,3990x2 ( 3 . 3 . 36 ) 

From Example 3,3.1 » we have obtained, 

max(-ln p2(l,2;y)): y = -0.4153X2 (3.3.37) 

In Table 3,2 the errors of misclassif ication of 
one kind given the other kind, that result due to the 
use of the above linear discriminant functions are presented. 


52 


The results of all columns excepting the first are collected 
from Kullback ([3l])» The linear discriminant function in 
the last column is found by pooling variances and covariances 
between the samples and proceeding as if the covariance 
matrices were the same. 

It is clear from Table 3.2 that maximizing I(2,l*y) 
and J(l,2jy) yields the LDFs which have larger errors of 
misclassif ication than the other five for whom the errors 
of misclassif ication are very much alike. 

(b) In this section we present some comparisons for 
a class of problems where the basic process under consideration 
follows an autoregressive schem.e . In the following examples, 
we study the convergence and the rate of convergence of the 
iteration process involved (3.3.6,3.3.24,3.3.25,3.3.26,3.3.31 ) 
and also compute the type II error 62 resulting from the 
use of the LDFs obtained by maximizing the distances and by 
the Anderson-Bahadur (denoted by A-B) procedure for a given 
type I error e^. We consider the underlying process as to 
be a non-stationary autoregressive (AR) process of order 
one and two. We divide the examples into two parts depending 
upon the order of the AR process. 

(i) A first order AR scheme is defined as 

Z(t+1) = PZ(t) + e(t+l), (t = 0,1,.., n-1 ,...) (3.3.38) 

where { Z(t), t > 0 } is the centered process i.e. 

Z(t) i X(t) - p(t) 



53 


and 


Ee(t) = 0 V t 


^0 if T ^ 0 

Ee(t)e(t+T:) = 

? 1 if T = 0 

By repeated substitution of 

Z(t-i) = P Z(t-i-l) + e(t-i) for i = 0,..,n-l 
in (3.3.3 8) » we obtain, 

r, i 

ZCt+1) = Z(t+l-n) +1 P c (t+l-i) 

i=o 


or 


n-1 • 

Z(t) = P^ Z(t-n) + Z P^ e (t-i) 

i =0 


(3, 


Assume Z(o) = 0. 

Then (3.3.38)' reduces to (see Bhat [lOj) 

n-1 . 

Z(n) = L P-^ e(n-i) 
i=o 

n— 1 . n+'C-l ^ 

Thus EZ(n)Z(rH-T)=E( Z P e(n-i)( Z Pc (n+T-i)) 

i=o i=o 

n-1 n+T-1 -4 

= Z Z P^ c (n-i )e (n+x-j) 
i=o j=o 

n— 1 j i+T 
= Z P^ p-*- *■ 
i=o 

^ n-1 9 4 

= P^ Z (P^) 
i=o 


i-p2 

l-P^^ p(J-i) 

l-p2 


3.38)’ 


i.e. EZ(i)Z(3) = 


(3.3.39) 



54 


We determine fd-(t), t > 0 } to satisfy the first order 
difference equation ! 


4(t+l) = P 4 (t) 

Now we consider the following examples. 
Example 3.3.4 I Under I P = .9,4^ = ! 

Under H 2 I P = -.9, = 0 

Example 3.3.5 I Under I P = .2 

Under H 2 I P = .7 
Example 3.3. 6 t Under t P = .2 

Under H 2 I P = .15 


(3.3.40) 


The 6 vector is the same in all the above examples wfiere the 
sample size varies from 2 to 20. The distinguishing feature 
of the examples is that the P’s differ significantly in 
Example 3.3.4, differ moderately in Example 3.3.5., and are 
very close in the third Example 3.3.6 . The computations 
are shown in the following tables. The initial value of 0 
in the iterations is -1 in all the examples, n denotes the 
number of observations and " iter '' denotes the number of 
iterations required to get the optimal 0 ( denoted by ^ ) . 

The comparisons of the performances of the LDFs are also 
shown graphically in the figures. We xirst present the 
graphs to have an over-all view. 










Table 3.3 


Type II errors 62 resulting from the use of LDFs obtained 
by maximizing -lnp 2 (l, 2 jy) 









Example 

9 

iter 

n 

*01 

= .05 

= .16 


49.5765 

5 

2 

1.0 

1.0 

1.0 


31.6438 

6 

3 

l.U 

1.0 

1.0 

3.3.4 

36.2683 

6 

4 

1.0 

1.0 

1.0 


43.6417 

6 

5 

1.0 

1.0 

1.0 


89.8485 

6 

10 

1.0 

1.0 

1.0 


57.5591 

8 

20 

1.0 

1.0 

1.0 


-.2252 

4 

2 

0.8159 

0.6368 

0.4286 


-.0707 

5 

3 

0.7486 

0.5753 

0.3974 

3.3.5 

-.0153 

5 

4 

0.7019 

0.5398 

0.3783 


-.0101 

5 

5 

0.6700 

0.5160 

0.3669 


.0396 

3 

10 

0.6000 

0.4700 

0.3450 


.0457 

3 

20 

0.5800 

0.4600 

0.3400 


-1.1327 

2 

2 

0.8708 

0.6664 

0.4090 


-1 .1480 

2 

3 

0.8365 

0,6104 

0.3446 

3.3.6 

-1.1455 

2 

4 

0.8078 

0.5636 

0.3050 


-1.1403 

2 

5 

0.7852 

0. 5279 

0.2743 


-1.1223 

2 

10 

0.7231 

0 .4701 

0.2265 


-1.1148 

2 

20 

0.6800 

0.4000 

0.1800 



Table 3.4 


Type II errors 62 resulting from the use of LDFs obtained 
by maximizing I(l,2jy) 


Example 

ft 

iter 

n 

e3^=.01 

ej^=.05 

e^=.16 


10.3138 

3 

2 

1.0 

1.0 

1.0 


18.3408 

4 

3 

1.0 

1.0 

1.0 

3.3.4 

27.3149 

5 

4 

1.0 

1.0 

1.0 


36.8808 

6 

5 

1.0 

1.0 

1.0 


18.1935 

6 

10 

1.0 

1.0 

1.0 


12.8295 

7 

20 

1.0 

1.0 

1.0 


-2.0212 

7 

2 

0.8365 

0.6554 

0.4404 


-0.7370 

7 

3 

0.7734 

0 . 5910 

0.4090 

3.3.5 

-0.3578 

9 

4 

0.7257 

0.5517 

0.3783 


-0.2012 

9 

5 

0.6879 

0.5239 

0.3669 


-0.0305 

9 

10 

0.6100 

0.4800 

0.3500 


-0.0020 

10 

20 

0.6000 

0.4600 

0.3450 


34 . 5554 

3 

2 

0.9998 

0.9983 

0.9884 


31.7064 

3 

3 

0.9998 

0.9990 

0.9929 

3.3.6 

32.7184 

3 

4 

1 .0000 

0.9993 

0.9951 


33 . 9604 

3 

5 

1 .0000 

0 . 9997 

0.9967 


38.8674 

3 

10 

1 .0000 

1 .0000 

1.0000 


41.3886 

3 

20 

1.0000 

1 .0000 

1.0000 



65 


Table 3.5 

Type II errors 62 resulting xrom the use of LDFs obtained 
by maximizing I(2,liy) 


Example 


iter 

n 

ej^= .01 

ej^=.05 

e^=,16 


-83.2602 

6 

2 

1.0 

1.0 

1.0 


36.0286 

6 

3 

1.0 

1.0 

1.0 

3.3.4 

37.3797 

6 

4 

1.0 

1.0 

1.0 


44.1045 

6 

5 

1.0 

1.0 

1.0 


89.8911 

8 

10 

1.0 

1.0 

1.0 


57.0704 

6 

20 

1.0 

1.0 

1.0 


0.1781 

3 

2 

0.8106 

0.6331 

0.4325 


0.1776 

3 

3 

0.7454 

0.5793 

0.4052 

3.3.5 

0.1630 

3 

4 

0.6985 

0.5438 

0.3897 


0.1499 

3 

5 

0.6700 

0.5398 

0.3821 


0.1176 

4 

10 

0.6100 

0.4850 

0.3700 


0.1056 

4 

20 

0.6000 

0.4800 

0.3700 


-0.0335 

2 

2 

0.8708 

0.6664 

0.4090 


-0.0376 

2 

3 

0.8365 

0.6104 

0 .3446 

3.3.6 

-0.0369 

2 

4 

0.8078 

0.5636 

0.3050 


-0.0359 

2 

5 

0.7852 

0.5279 

0.2743 


-0 .0322 

2 

10 

0 .7231 

0.4701 

0.2265 


-0.0304 

2 

20 

0.6800 

0.4000 

0.1800 



66 


Type II errors 62 
by maximizing J(l,2iy) 

Table 3.6 

resulting from the use 

of LDFs obtained 

A 

Example 9 

iter 

n 

ej^= .01 

e2^=,05 

er*16 

11.4294 

4 

2 

1.0 

1.0 

1.0 

18.7921 

4 

3 

1.0 

1.0 

1.0 

3.3.4 27.5608 

5 

4 

1.0 

1.0 

1.0 

37.0316 

6 

5 

1.0 

1.0 

1.0 

18.2201 

6 

10 

1.0 

1.0 

1.0 

12.8339 

4 

20 

1.0 

1.0 

1.0 

-0.0156 

4 

2 

0.8159 

0.6368 

0 .4286 

0.1019 

4 

3 

0.7486 

0.5753 

0.3974 

0 0 R 0.1228 

0 • 3 • 0 

4 

4 

0,7454 

0.5398 

0.3859 

0.1243 

4 

5 

0.6985 

0.5160 

0.3783 

0.1089 

4 

10 

0.6100 

0.4850 

0.3700 

0 .1004 

3 

20 

0.6000 

0 .4700 

0.3600 

-1 .2462 

2 

2 

0.8708 

0.6664 

0.4090 

-1 .3145 

2 

3 

0.8365 

0.6104 

0.3446 

, .3 , -1.3367 

3 • 3 • 0 

2 

4 

0.8078 

0.5636 

0.3080 

-1.3492 

2 

5 

0 .7852 

0.5279 

0.2743 

-1 .3636 

2 

10 

0.7231 

0 ,4701 

0.2265 

-1 .3649 

2 

20 

0.6800 

0.4000 

0.1800 


Table 3.7 


Type II errors 62 resulting frorr* the use of LDFs obtained 
by the A-B procedure 


Example 

B 

iter 

n 

II 

• 

0 

62= .05 

ei-.16 


2.2411 

4 

2 

1 .0 

1.0 

0.6484 


5.8541 

11 

3 

1.0 

1.0 

0.5000 

3.3.4 

1.2776 

4 

4 

1.0 

1.0 

0.5000 


2.2267 

9 

5 

1.0 

1.0 

0.5000 


0 .0000 

10 

10 

1.0 

1.0 

0.5000 


- 0.0020 

8 

20 

1.0 

1.0 

0.5000 


0.1639 

3 

2 

0.8106 

0.6331 

0 .4286 


0.0786 

3 

3 

0.7454 

0.5753 

0 .3974 

3.3.5 

0.0378 

3 

4 

0.6985 

0.5359 

0.3783 


0.0152 

3 

5 

0.6700 

0.5160 

0.3669 


0 .0546 

4 

10 

0.6000 

0 .4700 

0.3400 


-0.2392 

4 

20 

0.5800 

0.4600 

0 .3300 


0 .2704 

2 

2 

0.8708 

0.6664 

0.40 90 


0.1761 

2 

3 

0.8365 

0.6104 

0.3446 


0.1039 

2 

4 

0.8078 

0.5636 

0.3080 

3 #3 • 6 

0 .0483 

2 

5 

0.7852 

0.5279 

0.2743 


-0.8701 

2 

10 

0 .7231 

0.4701 

0.2265 


-0.9715 

2 

20 

0.68OO 

0.4000 

0.1800 


We can draw the following conclusions frcm the above 
Tables - 

(1) In Example 3,3,4, all the LDFs resulting from 
maximizing -lnp2(l ,2;y ) , I(l,2;y), lC2,i;y) ,J(l ,2;y) and 
that due to the Anderson-Bahadur (A-B) procedure accept 

except the LDF obtained by the A-B procedure when the 
type I error is .16 is which case Type II error is nearly 
.5000. The increase in the sample size up to 20 has no 
effect on the performances. 

(2) In Example 3.3,5, the LDFs obtained by maximizing 
-lnp2(l»2^y) do better than the others from the error point 
of view and its performance is the same as that of the A-B 
procedure in almost all cases. As n increases, the performances 
improve . 

(3) . in Example 3.3.6, all the LDFs have the same performanc 
except the LDF due to lCl,2jy) which does not do well. Here 
also, the performances improve as n increases. 

(4) (i) In Example 3.3.4, the convergence of the iteration 
in maximizing J(l ,2;y ) ,1(1 ,2;y ) is most rapid. 

(ii) In Example 3.3.5, the number of iterations 
required to get an optimal 9 is hiynesr in maximizing 
I(l,2;y) and the lowest in the A-B procedure and maximizing 
1(2, i;y). 

(iii) The number of iteration taken in maximizing 

-lnp2(l ,2;y) , I(2,i:y),J(l,2;y) and in the A-B procedure is 
the same in Example 3.3.6 , which is lower than that of 
maximizing J(l,2iy). I 



69 


(ii) A second order AR scheme is given by 

Z(t+2) = Z(t+1) + ^2 2(t) + e(H-2) (t = 0,1,.., n-1,...) 

(3o 3.41) 


Define an operator F by 
F Z(t) = Z(t+1) 

Thus (3,3.41) can be written as 

(F^ - Pj_F - Z(t) = e(t-h2) 
The homogeneous part of this equation is 

Cf 2 _ - Pj) Z(t) = 0, 

which has a general solution 
Z(t) = 


where "^he solutions of the operator equation known 

as the characteristic equation I 

f 2 - Pj^F _ P2 = 0. 

We have , 




1 


P;^ + (p2+4p2)^/^ 
2 


^ 2 ~ 2 

Assume + 4^2 > 0 , so that and ^2 real numbers. A 
particular solution is obtained by writing 

z(t) = L(F-.Si)"^(F-S2r^)]E: (t+2) 



/ \J 


OO > *1 M GO ^ 

= [ £ (p^) £ (f^) ]e(t) 

o o 

= E £ Sf 4 e(t-r-s). 
x=o s=o 

The complete solution of the difference equation (^.3,41) 
is the sum of the general solution of the homogeneous part 
and a particular solution. 

Thus , we get 

Z(t) = m sJ + £ r 

^ r=o s=o 

Assume Z(o) = 2(i) =0. 

Thus X and M can be obtained as 


X 


oo 

- z 

r=o 


h ®l-r^ ^ 1” 



“ = S *1 

s— 0 

Hence, after doing considerable simplification, we obtain 


(see Bhat [lO]) 
Z(t) 

where 


t-1 

S a^e(t-i) 
i=o ^ 



S1-I2 


Finally, 



71 


t-1 t+h-1 V 

EZCt)Z(t+h) = E( Z a.e(t-i))( Z a^e(t+h-i)} 


i=o 


i=o 


= Z S a. Ee(t-i)e (t+h-j) 

j i ^ ^ 

t-1 


Z 

1=0 


2 t 


1-tf 


(. p2t. th+2 

4. } 52 __ -j (3.3.42) 


We determine !nCt) 1 as a solution oi the difference equation 
/JL(t+2) = Pj.'^Ct+l) p2^Ct). 

Pv^ninTe 3 .3 o2 *. Under *. = .7 , ^ “ 

Under H 2 i =-.7, ^2 ~ -*1 ’^0 “ ^ 

Fra.nle 3 .3. 8. t Under i 9^ = -T • 9>2 " ‘I’ 

Under Hj : • ?2 " ''’I 

Fv.mnie 3.3.9 S Under Hj^ = Pi = •''> ^2 “ 

Under ^2 * ^2 ” 

The 5 vector is the same In all the examples above. 

We furnish the computations similar to those as in the 
AR(1) process-examples in the following Tables. 













Table 3.8 


oz 


Results due to -lnp2Cl»2ly) 


Example 

0 

itar 

n 

ej_=.0l 

II 

• 

o 

e^=.16 


-1 .0000 

1 

2 

0.90 82 

0.7389 

O.SDOO 


0.1389 

9 

3 

0.7157 

0.6141 

0.5120 


0.1152 

10 

4 

0.7190 

0.6331 

0.5120 

3.3.7 

0.1943 

11 

5 

0.7734 

0.6480 

0.5120 


0.1760 

11 

10 

0 .7000 

0.6000 

0.4800 


0.1539 

10 

20 

0.6900 

0.5900 

0.4500 


-0.1416 

5 

2 

0.8051 

0.6293 

0.4325 


-0.1417 

5 

3 

0.8051 

0.6293 

0 .4325 


-0.1510 

4 

4 

0.8051 

0.6293 

0 .4325 

00 

• 

CO 

• 

CO 

-0.1512 

3 

5 

0.8051 

0.6293 

0.4325 


-0.1654 

3 

10 

0.7831 

0.6103 

0.4209 


-0.1720 

3 

20 

0.7054 

0.5980 

0.4116 


-0.8802 

2 

2 

0.8413 

0.6591 

0.4129 


-0.8816 

2 

3 

0.8413 

0.6591 

0.4129 


-0.8817 

2 

4 

0.8413 

0.6591 

0.4129 

3.3.9 

-0.8816 

2 

5 

0.8413 

0.6591 

0.4129 


-0.8819 

2 

10 

0.8456 

0.6580 

0.4089 


-0.8881 

2 

20 

0.8350 

0.6572 

0.4100 


oo 


Table 3.9 

Results due to I(l,2;y) 









Example 

B 

iter 

n 

ej ^=.01 

€^^=.05 

ei =.16 


• 

• 

2 

• 

• 

• 


8.2614 

8 

3 

1.0 

1.0 

1.0 


3.4601 

10 

4 

1.0 

1.0 

1.0 

3 . 3.7 

3.9963 

5 

5 

1.0 

1.0 

1.0 


4.6401 

6 

10 

1.0 

1.0 

1.0 


5.4059 

5 

20 

1.0 

1.0 

1.0 


- 1 .8069 

11 

2 

0.8365 

0.6554 

0.4443 


- 2.0192 

14 

3 

U .8305 

0.6554 

0.4443 


- 1 . 9102 

10 

4 

0.8365 

0.6554 

0 .4443 

3 . 3.8 

- 2.3411 

9 

5 

0.3365 

0.6554 

0.4443 


- 2.4500 

6 

10 

0.8229 

0.6417 

0 .4332 


- 1.1999 

5 

20 

0.8100 

0.6339 

0./1200 


- 29.2505 

3 

2 

0.8413 

0.6591 

0.4129 


- 30.1821 

4 

3 

0.8413 

0.6591 

0.4129 


- 30.2233 

4 

4 

0.8413 

0.6591 

0.4129 

3 . 3.9 

4 

5 

0.8414 

0.6591 

0.4129 

- 30.3009 


-30 .4103 

4 

10 

0 .8414 

0.6591 

0.^129 


- 30.4350 

4 

20 

0.8591 

0.6591 

0.4129 


Table 3.10 

Results due to I(2,liy) 


Example 

0 

iter 

n 

eOl 

62=. 05 

€^^=,16 


0.1829 

4 

2 

0.7324 

0.6026 

0.4602 


0.1829 

4 

3 

0.7357 

0.7324 

0 . 5000 

3.3.7 

0.2622 

5 

4 

0.81 59 

0.6484 

0.9000 

0.2157 

5 

5 

0.7764 

0.6554 

0.5000 


0.1783 

5 

10 

0.7700 

0.6700 

0.5500 


0.1661 

3 

20 

0.8300 

0.7400 

0.6500 


0.1913 

3 

2 

0.8361 

0.6413 

0 .4637 


0.1605 

4 

3 

0.8361 

0.6413 

0.4637 

3.3.8 

0.1707 

4 

4 

0.8361 

0.6413 

0.4637 

0.1812 

6 

5 

0.8361 

0.6413 

0 .4637 


0.2001 

8 

10 

0.8301 

0.6356 

0.4501 


0.2239 

10 

20 

0.8229 

0,6300 

0.4400 


0.0296 

2 

2 

U.8413 

0.6591 

0.4129 


0 .0297 

2 

3 

0.8413 

0.6591 

0.4129 

3.3.9 

0.0296 

2 

4 

0.8413 

0.6591 

0.4129 

0.0296 

2 

5 

0.8413 

0.6591 

0.4129 


0.0203 

2 

10 

0.8571 

0.6582 

0.4091 


0.0296 

2 

20 

0.8591 

0.6591 

0.4120 



Table 3.11 

Results due to J(li2jy) 




iter 





Example 

9 

n 

ej^=.01 

ej^=.05 

ej^=.16 


-1 .0000 

1 

2 

0.9099 

0.7422 

0.5000 


0.1418 

7 

3 

0.7190 

0.6217 

0.5160 


0.2470 

10 

4 

0.8212 

0.6484 

0.5280 

3.3.7 

0.2106 

7 

5 

0.7764 

0.6517 

0.5160 


0.1779 

7 

10 

0.7700 

0.6800 

0.5800 


0.1539 

8 

20 

0.7700 

0.6600 

0.5500 


0.0521 

5 

2 

0.8215 

0.6293 

0.4443 


0.0434 

5 

3 

0.8215 

0.6293 

0.4443 

3.3.8 

0.0356 

4 

4 

0.8215 

0.6293 

0.4443 


0.0341 

3 

5 

0.8215 

0.6293 

0.4443 


0.0200 

3 

10 

0.8200 

0.6201 

0.4300 


0.0010 

3 

20 

0.8100 

0.6109 

0.4205 


-0.8019 

2 

2 

0.8413 

0.6591 

0.-dl29 


-0.8040 

2 

3 

0.8413 

0.6591 

0.4129 


-0.8041 

2 

4 

0.8413 

0.6591 

0.4129 

3.3.9 

-0.8041 

2 

5 

0.8413 

0.6591 

0.4129 


-0.8142 

2 

10 

0.8413 

0.6591 

0.4129 


-0.8152 

2 

20 

0.8411 

0.6591 

0.4129 


86 


Table 3.12 

Results due to the A— B procedure 


Example 

0 

iter 

n 

ej^*,01 

e^=.05 

ej^=,16 


0.4647 

6 

2 

0.7673 

0.6406 

0.4880 


0.2332 

31 

3 

0.6293 

0.5753 

0 .4880 

3.3.7 

0.1435 

7 

4 

U.6133 

0.5871 

0.4840 


0.3952 

7 

5 

U.6389 

0.5120 

0.4801 


-0.0947 

14 

10 

0.6100 

0.5800 

0.4400 


-0.0792 

14 

20 

0.6000 

0.5800 

0.4400 


0.1503 

3 

2 

0.7995 

0.6255 

0.4325 


0.1621 

5 

3 

0.7995 

0.6255 

0.4325 

00 

• 

00 

• 

CO 

0.1654 

5 

4 

0.7995 

0.6255 

0 .4325 


0.1637 

4 

5 

0.7995 

0.6255 

0.4325 


0.1801 

4 

10 

0.7800 

0.6019 

0 .4211 


0.1800 

4 

20 

0.6992 

0.5801 

0.4116 


0.2461 

2 

2 

0,8413 

0.6591 

0.4129 


0.2461 

2 

3 

0.8413 

0.6591 

0.4129 

o q O 

0.2460 

2 

4 

0.8413 

0.6591 

0.4129 

o • O • V 

0.2461 

2 

5 

0.8413 

0.6591 

0.4129 


-0.2106 

2 

10 

0.8351 

0.6562 

0.4089 


-0 .2203 

2 

20 

0.8200 

0.6500 

0.4000 


We observe the following from the above Tables. 

(1) In Example 3.3.7 , the LDFs due to -lnP2(l,2;y) do 
better than the LDFs obtained by maximizing other distances and 
the A-B procedure does little better than this. But the finding 
of the optimal ® in the A-B procedure requires more iterations. 
The maximization of 1(1, 2;y) does not provide an optimal ® 
for n = 2, because during the iteration, 0 becomes undefined 
due to the fact that a'R,a becomes equal to a'R^a. 

( 2 ) The number of iterations taken in getting an optimal 9 
in Example 3.3.7 , is much higher than that in other Examples 

in general. 

( 3 ) In Example 3.3.8, and Example 3.3.9, all the LDFs have 
the same performance upto n = 5 and for n = 10, n = 20, we have 
the same conclusion as in Example 3.3.7. 

From the above experiments (based on AR(l) and AR(2)) 
we can infer that the A-B procedure (which is admissible) 
does a little better than the LDFs depending on the Bhattacharyya 
distance in majority of the cases considered. And the LDFs 
yielded by Bhattacharyya distance do better than the LDFs 
given by other distances in almost all the cases. But since 
the A-3 procedure suffers from computational difficulties 
(not only that for each e-, value we have to find the a vector 
anew but also that it takes sometimes more computer - time 
to find it), the Bhattacharyya distance is preferrable. This 



claim is strengthened as we shall see in later sections 
when we consider the covariance stationary time series 
wherein only our criterion of maximizing the Bhattacharyya 
distance admits an analytical solution for the required 
a-vector asymptotically, 

#s> 

Remark 3,3,7 I The above conclusions do not go against 
the claim made earlier that our procedure, the procedures 
(due to maximizing other distances) and the A-B procedure 
are all admissible. To see that let us consider the 
Example 3,3,1 again. Having obtained the required a*s by 
maximizing the distances we find c according to the following 
relation 

c = , (3.3.43) 

given in the Anderson-Bahadur theorem (see Remark 3,6,3) J 
thus we can compute yj^ and Y 2 *^ • find Y 2 hue to the 
A-B procedure for the given y^ that results from the use 
of the LIF yielded by our criterion. The computations are 
shown in the Table 313 , 

Table 3.13 . (Admissibility in- Example 3.3,1) 

-lnP^(l,2;y) 1(1 .2;y) I(2,i;y) J(l,2;y) A-B 

yj^ 1.5600 0.7265 5.1575 4.2715 1.5600 


13082 


1.7218 -0.4140 -0.1242 


1.3082 



It is clear from the above Table that our procedure is 
admissible and admissible procedures are not comparable. 

It may be pointed out that in the above ARCl) and AR(2) 
examples, c was not chosen according to (3.3.43)* 

Remark 3.3,8 I We have derived the covariance matrices 
for AR(l) and AR(2) by solving some difference ecoiations. 

They can also be obtained by an interesting method which 
we shall describe now. 

A pth order autoregressive scheme is defined as 

Z(t+p) = Pj_Z(t+p-l) + g2^(t+p-2) +...+ Pp2(t) + eCt+p), 

(t = 0 ,1 , . . ,n— 1 j . . . ) 

which can also be written as 


Z(t+1) 

Z(t+2) 


f 0 1 0 ....0 
tool ....0 
0 0 0 ....0 


Z(t) 

Z(t+1) 


Z(t+p) 


0 0 0 1 

pp^p-1 — ^1 


Z(t+p-l) 


e(t+p) 


(3.3.44) 


Z(t+1) 

ZCt4-2) 

Z(t+p) 


, “t+p 


eCt+p) 



'7W 


Then (3.3,44) reduces to 


Zt+D 


.t+p-l “t+p 


n-l 


=> B (on the assumption = 0). 

ThenHp.p i Ej^yJ = E( Uj_,)(V u'.., B' >=) 

= f I S" (SUl-k 

= ( I B^B' ^)B' where Eu^i^ = L = 

Ic 


0 0...0 

0 ... .0 

0 0...1 


= (\^ B^b’ ^)B' ( where B = BL) 
k=l ^ 

= [ I-(B^B[)^][I-B^B^r^ (3.3.45) 

Hence EZ(jJZ(j) = (i,j)th element of the required covariance 
matrix = (p,p)th element of H, 

Remark 3.3.9 t The general form of our linear discriminant 
function (LDF) is 

ya = i’"® 

It follows from (3,3,6). 

Remark 3.3.10 ! Noting the form of a in (3.3.6) we may 
restrict our attention to the class 

C = { a 1 a = t ® being any scalar 9 Rg is non-singu; 

for an a that would maximize -lnP 2 (l ,2;yQ) . Then the maximizing 

ro ^ v/- 



91 


condition is 3 polynomial in S and the properties of the 
roots can be studied. We illustrate this for the Example 

3.5.1* 

Vie have 




(o?o^)^ 




1 


exp % 


1 iil^^Ksee 2.2.3) 

(3.3.46) 


2 . 2 

O1+O2 


where 




R^R 


-1 

© 


6 


6.92-36.750 


= 


I 


One can show easily 


-5.27-13.92 0 1 
40.89-287.920 


,,^- 1 . _ 10112 ^( 0 . 4262 - 4 . 17890 ) 

6 Rq ^ - iRqI 


2 ^ (l2l7^7°^^^2.4176l(o2_n 1 qrq + O.Ol), 

’1 iRgl^ 


2 _ (/|al7ft798~) fd.4176l^^ nooij^-0.8203 9 + 0.845) 

and - iRgl^ 


glnP^ _ ^ reduces to 
Then, after some slmpllf icatrons . _ 

X 5 4. n oBfi* - 0.0242#+ 0.0039 - 0.00028 

n 7506 + 0.070328= + 0.089 

. , hv Qraeffe’s Boot-Squaring method 

We have solved this polynomral by Graeffe 

UppendixD). The computer output is given below: 


92 


Table 3»14 

Polynomial method in Example 3.3.1 


Root 

Value of the 
polynomial 

Iter- 

ation 

Conclusion The 

sign of 

-0.4151910 

-0.1208365E-10 

5 

possibly a root 

> 0 

0.204327 

0.1619732E-O4 

5 

possibly a root 

< 0 

0.000000 

0 .OOOOE+OO 

5 

is a root 

< 0 

15.657150 

0.14638E+08 

5 

possibly not a root 

— 


From Table 3.14 we conclude that --lnP2(l »2Jy0) has 
a maximum at ©= -.415191 and this is consistent with our 
earlier finding (see Section 3.3.1). 


3.4 COMPARISON OF THE BEHAVIOUR OF THE LDF OBTAINED BY MAXIMIZING 
THE BHATTACHARYYA DISTANCE WITH THE QUADRATIC (OPTIMAL) DISCRIMINANT 
FUNCTION WHEN R2 « dR^ 


We consider the case when R2 = dR^^ , d > 0 scalar » 
as in ([19]). 

Then the LDF obtained by maximizing the Bhattacharyya 

t 1 X 

distance is given by * since for R2 = dR^^ » (3.3.2) 

becomes 


1 


l|(d+l)a'Ria J ^ 


Z 


1 

a' 


in 


> 


which in turn 


y\j 


eauals [ln(- 


(a«6)‘ 


{ t(d+l)}' 


■) -t ] 

; a’ [(d+l)R. ]a 


Thus our linear procedure is given by 
6’ R7^ X H, 

>1 c- (3.4.1) 

«2 

(Accept) 

Apply the following transformation I 

Y = Rr^/2(M,-X) (3.4.2) 

This reduces R^ to the identity matrix and R 2 to dl. 

Then, Y N^(0,I) under 

Y ~N^(r,dI) under H 2 , where v = R~^/^(|tij^-U. 2 ) = 

(3.4.3) 


Now, 


6' RT^x 
d+1 


1 _ 1 
v'R^R7i()a_X) 

d+1 d+1 • 


d+1 


+ c, 


Then (3,4.1) reduces to 
V'Y H, 


d+1 


«2 

(reject) 


(3.4.4) 


The errors of naisclassif ications associated with (3*4»4) 
are given below I 



94 


where 


e. = 1- $ C ) 

r vty 1 

»»» i 

{ 

(d+l)^ 

c(d+l) 

= 1 - $ ( 



w+ l-«d 


(3.4.5) 


(3.4.6) 


and 0) is the prior probability of 

yly 


62 = f (■ 


C - 




‘^1)2 


1 



fs> CO 


[c - 


$ (■ 


ui+i-wa 


(<> + l*HSDd 
d+1 


-Kd+1) 


y » V 


{ 


• [ (i) + 1— ud 3d i 


w + l-wd 


c(d+l) - + l-wd] 

= #C ) 


{ dT^[w + l-<>5d] } 


1 

2 


(3.4.7) 


Now the total nrobability. of ri sc] as?if icati ^ is defined as 
wej^ + (l-w)e2 • (3.4.8) 

The cut-off point c is chosen to minimize (3 •4.8). 



95 


The value of c for which C3'4»8) is a minimum satisfies the 
equation (putting the first derivative =0) ! 


c^(d-H 

^ e ^ T2[m+ IZwd] 

(t2[w+ T^])^ 


^ {cCd+l)-T2[w 

~ i . l~^) X- ^ - tl .,, ) ^ Q 2 dT^[w + i-wd] 

{ dT2[w + 1^]}^ 


or ln(j^ (d)^) 


1 

2 


(3.4.9) 

{ c^(d+l)^+T"^[(i) + -2cT^<d+l)[w + l-tod]l 

dT^[(i) + l-(od] 


1 c^(d^-l)^ 

2 T^[tt + 1-wd] 


_ 1 


(d+1)' 


^ dT^[w + 1-0x1 ] 


[c2-dc2^T-^( f 


- 2c T 


2 o> 1— o)d 

d+1 


■] 


or, (l-d)c^ - 2cT^ 


a>+ 1-0x1 
”d+l 


j. t 4/ 0> 1— o>d '>2 

+ T (—31^ } 


2dT^r 0) l--0)dj. ij, 

(d+l)^ 


0 ) 


1-Ou 


d+1 
(d) 


1/2 = 0 


(3.4.10) 


That is, the required c 


satisfies a quadratic equation, solutions 



of which are given by (after some simplifications), 

d-1 


[w +■ l~wd] 

= (d^iTO-d) + 


1 

•2 J./■JT2^^ I- t2 


1 

(ind + 21n 


<n + 1-wd 

when d 7 ^ 1. (3.4.11) 

and c = ^ ^ In when d = 1 (cf .( 3.4.10 ) ) (3.4.12) 


Remark 3.4.1 1 The only instance in which the optimal outpoint 
is not given by (3.4.11) is when assigning every observation 
to the same population yields a lower total probability of 
misclassif ication. This probability will be min((«),l-(i)) and 
can be obtained by making c infinite. 

Remark 3.4o2 I Once we have found c via (3.4.11), our 
desired test (3.4.4) is completely specified. 

Remark 3 .4.3 I The root corresponding to the negative sign 
in (3,4.11) gives the required minimum. This can be shown 
by computing the second derivative of the total probability 
of misclassification. 


3 c 


2 + l--c.>e2] - 

(d+l)(.> 


(d+l)^c 


1 


c^(d+l)^ 


(T2[».ir;^])^ T2[».l-<0d] 


^ ® T^[« + l-(i)d] 


. (V-.Ud.n2 ,(dn)c-T 

i dT^[m + 1-wd] 
(dT^Cw^-l-wd])^ 


^ (c(dn)-T^[<^l-wd]i 


dT^[w + l-(i)d] 



w(d+l ) 


cCd+l)^., 


T"[o>-M-u>d] 


_l c^(d-H)^ 

— e ^ T^fw*** l^d] 


, , I {c(d+l)-T^[<*) + l-c*)d] 

(l-^)(dH‘l) -J 


(dT 2 [Q + ITZid])^ 


dT^[(i) + l-Cixl] 


c(d+l]p Cl-w) 


I Cc(d+1)-T^[w + l-(i>d] r 


T^[w ■>* IIwd](dT2[« + l-uxl])^ 


r 5 


dT^[w + l-{.)d] 


(l~w)(d+l)' 


(d+l)c - T^[q + l-6xi] 

i } X 


(dT 2 [^ + l-ud ])5 + 


^ {c(d+l)-T 2 [Q + l-oKl] r 

2 I ZITT 

dT‘^[Q + l~(«Ki] 


Using (3.4,9), we have 
3 ^[u)e^ » l-uae^ ] 


3 c" 


(l-w)(d+l)‘ 


■y {c(d+l) - 




T^[fa)+ l-(.xl] (dT^[w + l-wd])“ 


”2 dT 2 [tt) + I^] 


X e 


98 


> 0 

O — — — « 

/ X c(d+l 'i T [u) + l-axa] 
iff c(d+l) - ^ ■ ' + 2 i > 

This happens if 


f [w + l-a)d] p , p ^ 

' 2 [T ~(cIT2)2 ^ j 2 ^ 


(x) 


C = 


f 

I 




05+I-ti3d 


(ind + 2in 


if d < 1 


i 

I 


[w + 1 - 0 X 1 ] 2,^9 

{T"='+ (dT)^ { + 


(<> 


/, -Clnd + 21n T^)r], 

(l-d'^) c^l-ud ^ “ 

if d > 1 

We shall now apply the same transformation as C3»4.2) 
to the quadratic discriminant function (QDF) described in 
Section 3.3, Then (3,2.1) reduces to ([193), 

2 

Z = Z Zt > K , 
i=l ^ 


where 


and 


z2 = L d =lL. (Y. + — i- 
1 2d "^^i d-1 ^ 

[m + l-Cixl]T^ 


K = in + I Ind + 


2(d-l) 
bles, 1 

Fatnaik ([45]), the distribution of Z under Hj(j = 1,2) can 


Since the Z. are normal random variables, by a result of 


be approximated by a multiple, Cj , of a central —distribution 
with degrees of freedom ^ , where Cj and are chosen 
to satisfy 

Sj= Eh.(Z) = Eh . c/j 

3 J 

v2 t Varn.(Z) = Varlcj-x^j /p) = jc? ^ Cj = 1.2). 

^ 3 


and 



99 


It is easily shown that ([19]), 


[o> + 1-05 d 3 t^ 

■f : + n | d-ll} 


2d 


! d-ll 


= 2 = 5 ’ 


^ d[«+ 1-u d]T 


I d-ll 


+ n I d-ll } 


= -^ {[« + l-ud]T^ + 
a 

= d[w + 1^ d]T^ + 

Thus and are approximated by 

ej^ = P(assign x e H 2 /H^ is true) 

= > i") 


and 


02 = P(assign x is true) 

= f’x (/2> < 


i (3.4.13) 


(3.4.14) 


(.3.4.15) 


3.4.1 NUMERICAL RESULTS 

For all combinations of the following parameter values ; 

12 5 

T^ = 0,1, 2, 4, 8 j d = 0. 1 , 0. 2 , 0. 5 , 1 , 2 , 5 , 10; w ~ 2 ’ "3 * and 
n = 1,2,6,10; the total probability of misclassif ication 
resulting from the use of LDF considered here was calculated 
using equations (3.4.5) and (3.4.7) and for the QDF using the 
approximations (3.4.14) and (3.4.15). Some results are shown 
graphically in the figures : Fig. 3.20, Fig. 3.21 and Fig. 3.22 
The numerical results are shown in the following Tables. 

Table 3.15, Table 3.16, Table 3.17. 



101 



.1 .2 .5 1 2 


n = 6 

Linear discriminant function 

Quadratic disccriminant function 

w = .5000 



n = 10 


Fig. 3.20 







Linear discriminant fun 


Quadratic discriminant 
: .8333 


t2 

t2 






i w 


Table 3.15 


The probability of misclassif ication resulting from the 
use of LDF based on Bhattacharyya distance (for all n). 


I 


. 5000 


.6667 


. 8333 


2 

T^ d . . . 

.1 

.2 

.5 

1 

2 

5 

10 

0 

.50 

.50 

.50 

.50 

.50 

.50 

.50 

1 

.23 

.26 

.30 

.31 

.30 

.26 

.23 

2 

.18 

.20 

.23 

.24 

.23 

.20 

.18 

4 

.11 

.13 

.15 

.16 

.15 

.13 

.11 

8 

.05 

.06 

.08 

.08 

.08 

.06 

.05 

0 

.33 

.33 

.33 

.33 

.33 

.33 

.33 

1 

.26 

.28 

.29 

.27 

.24 

.21 

.18 

2 

.18 

.20 

.22 

.22 

.20 

.17 

.15 

4 

.10 

.12 

.14 

.15 

.14 

.12 

.10 

8 

.04 

.05 

.07 

.08 

.07 

.06 

.06 

0 

.17 

.17 

.17 

.17 

.17 

.17 

.17 

1 

.17 

.17 

.17 

.16 

.14 

.12 

.11 

2 

.16 

.16 

.15 

.14 

.12 

.11 

.09 

4 

.0* 

.09 

.10 

.10 

.10 

.08 

.07 

8 

.02 

.03 

.05 

.05 

.06 

.05 

.05 



107 


v£) 


n 

0 ) 

-D 

m 

H 




o 



i-H 

• 






— , 


lO 

o 



r-H 



t—J 


CM 

c 



c 

CM 


CO 




II 

r-H 

‘0-4 

C 


a 


iD 

a 


• 

CJ» 



c 


CM 

-H 


• 

CO 



D 


r-H 



• 

X) 



X 5 



CD 



C 



•H 


o 



f-H 

-p 



JQ 



o 


lO 

c 



o 



•H 



-P 


CM 

03 



O 



•H 

f-H 




r-H 

•H 

11 


to 



CO 

C 

lO 

03 


• 

f-H 



U 



CO 


CN 

•H 


• 

6 



Hh 


r-H 

o 


• 

>• 


• 



• 

•H 


• 

»-H 


T 3 

•H 



X) 


CM 

03 


H 

-Q 



O 



P 



a 



<D 



x: 



H 


3 


if) 00 -H r- oj 
I— I r— I •— H O O 


ON ^ ^ r- vp vD iT) CO 

-^0-400 ooooo 


^ O lO O ^ 
([\j f-H I i r—j O 


CD CO <N ^ 

00 CM C\ -H O 


O vO 00 O lO 
I— { f-H « — I O O 


OCOCO^'^O- 
CM CM »~4 'H O 


-H O O 0- iT) 

»— { 'H o o o 


vO CM O vO 

I — i r-H f-H O O 


O ^ vo CO oor-CMiDoo r-vo^Oin 
lOOOCM^O OOCMCM'-hO rHrHrH-HO 


00 00 CM 0- 

• • • • • 

CO MO O CM vD 

r- c- CO o ^ 

ro CM CN '-I O 

m • 

00 CM CM --4 O 

r-H < — 1 Ip O O 

TT O lO O ^ 

^ O in 00 CO 

r- lO in CN 

CM *-1 '-4 O 

CM rH rH O O 

P P 1 — ! O O 


lO CO ^ CM sOCOO^^ 
rH*-Hr-HOO rHrHrHOO 


T 3 - rH 00 ^ -H 

rH <— 4 O O O 


in o r- o 5 

CN CN ( O O 

• • • • * 

CD N" O ^ 

r — 1 * — 1 *H ' — 1 O 

» • • • • 

O o or- in 
P P o o o 

CM in CM lO 

00 CM ^ 'P O 

• • • • • 

^ O 0 - p vD 

CM CM P -4 O 

• ♦ • • • 

CO CM p 00 in 

P P • — ( o o 

• * • • • 

cNOroinc" 
^ ro CN -H o 

♦ • • • • 

CO CM CM -4 O 

• • ♦ * • 

vO CM O vO 
P P P P o 

O p MO 00 

in 00 CM O 

• ♦ • • • 

ro r- CN in CD 

CO CN CN -H O 

* • • * • 

r- MO o ^ 

P p P P o 

• « • • • 

cNOcoinr- 
00 CN o 

• • • • * 

ro O CO \0 

CO CN CN -H O 

• • • • • 

r- r- rocD s; 
Ip r-H PI ^0 CD 

CN in o CN in 
00 CN -H 'H O 

• • • • • 

f- r- o CO 

CO CN ^ O O 

• * • • • 

r- CN t CN 

P P P o o 

in o t" ^ 

CN CN -H o O 

. • • • • 

CM vo CM 

CM CM -H o O 

# • • • ♦ 

r-- CN in ^ 

P p Ip CO CO 


OrHCM^OO O'HCM’^CO 


O «H CM CD 


O 

O 

O 

lO 


c- 

vO 

MO 

sO 


(O 

00 

00 

00 


108 


CO 

CD 

r— i 

iO 

ro 

Hi 


O 


CD 

CD 

CO 


lu 

a 

a 

O'- 

c 

•H 

to 

>- 

X5 

TJ 

CD 

C 

•H 

fO 

-p 

JQ 

o 

c 

o 

•tH 

05 

U 


•H 

CO 

CO 

ro 
« — 1 
u 
to 


4-» 

O 

>- 


X) 

m 

XJ 

o 

p 

a 

CD 

x: 

H 


lO 

O 

rH 

ii 
C 

lO 

CM 


S r-trHr-iiH 

QOOO QQOOO 

boooo ooooo ooooo 


vC 

Ii 

c 


iD 


CM 


lO 


CM 


T5 


>«• n ro CM rij 
OOOOO 

• • • • • 

* • • « • 

CO CM CM ^ 
OOOOO 

CM CM CM -H M 

OOOOO 

^ CD lO O lO 

CM ^ ^ O 

« • • • • 

O vO CO O lO 

1— { »— H 1— i O 

• * • • • 

CM o o r- ^ 
rH ^ o o o 

• • • • • 

O -H vO 00 

lO 00 CM ^ O 

♦ • • • * 

CO r- CM iT) m 
ro CM CM -H O 

• • * • * 

\o O lO 
rp rp r—^ CD 

• • • * • 

CD lO O lO 

CM rH r-"] O 

* • • • • 

CM H ^ O lO 
•-{ r-i O O 

• • • • • 

lO CM O h- ro 
r-H r— i r- i O O 

• » • • • 

CO CO CnI 

ooooo 

^ 3 - ro ro CM >-1 
OOOOO 

^ ro ro -H ^ 
OOOOO 

• • • • * 

rH rH 'H 'Hi 

ooooo 

* • • • * 

1— 1 «— { r-H CD CD 

ooooo 

• • • ♦ • 

^ r-| O O O 
OOOOO 

« • • • * 

CO CD CM CM 

ooooo 

CM CM CM pj c:; 
OOOOO 

• » • « * 

rH 

ooooo 

O 00 OJ 

ooooo 

# • • • • 

00 r- o iQ pj 
OOOOO 

# • • • * 

in lO ro CM 
ooooo 

CO CO OD CM vp 
CM CM ^ 

• • • « * 

CO O vO ^ ^ 
CM O 

CM --H ro iQ 
1—4 ^D CD 

o -H ^0 S 

X) CO CM O 

CO H CM lO CO 
CO CM CM O 

♦ • ♦ * • 

r- \o o lO 

1— i 1— ! *— i O 

CO CO 00 CM ^ 
CM CM '-I -H O 

***** 

^ t--. fP if) 
CM CM O 

\0 CM H ro 

r-4 1 — 4 *— 1 O O 

O 00 5 CM 

ooooo 

OOOp; X 
OOOOO 

• • • ♦ • 

ro 'O in ro oc! 
ooooo 

00 CO CM CM ^ 
OOOOO 

ro ro CM c:! 

OOOOO 

* • • • • 

CO CM CM ^ 

OOOOO 

O ^ CM 00 

o -t CM CD 

O r-4 CM CD 

5000 

6667 

.8333 


i 


From the Tables $ we can conclude the following I 

1) For n = 1, QDF does little better than LDF 

2) Agreement is adeouate in the cases given below 
(irrespective of the value of n) I 

i) for large T value 

ii) for w near 1 

iii) for moderate range of d values 

3) Agreement becomes worse as n increases for a given 

ii). This occurs for the obvious reason that with large 
n, there are more variables with variance discrepancies 
to be utilized. 

3.5 TWO CLASSES OF TESTS (MODIFIED MINIMAX RULE) 

As shown earlier, maximization of the Bhattacharyya 
distance yields an a vector for the test given below 

H 

a«x ^^c (3.5.1) 

and a class of tests can be generated by assigning various 
values to c. 

Let us now consider an another method of specifying 
the test (3.5.1). e^^ and 62 are reproduced below for 
convenience I 


e^ = 1 - # (Yj^ ) . where y^^ 


1 

5 


(3.5.2) 



110 


and 


e2 = 1 - f (72) . where ~ 



(3.5.3) 


We have taken ^2 = 0 for simplicity and denote by 11 . 
We propose the follov.dng criterion I 


min PrCs) , for a given , (3.5.4) 

or min 62 (3.5.5) 

ei =k^e2 

or equivalently, max y^ , for some k2 (3.5.6) 

After some simplification, (3.5.6) reduces to 

a'p. 

maxi j } (3.5.7) 

~ kjCa'R^aA + (a'R^a/ 

and this gives rise to another class of tests generated by 
k^, for a and c are provided by this criterion. Now we shall 
establish a one-to-one correspondence between the two 
classes of tests described above which is embodied in the 
following theorem. 

Before stating the theorem we need to define the notion 
of equivalence of two tests. 

Definition 5.5.1 ! A test completely specified by the pair 
Ca,c) is said to be equivalent to another 5f each probability 

#v 

of misclassification of the former is equal to the corresponding 


one of the latter. 



Ill 


Theorein — 3 • Let (^fCg) denote a test obtained by 
maximizing the Bhattacharyya distance and (a. ,c ) denote 
that due to (3.5.7). Then 

(a) for a given Cg , we can find a k 2 such that 

ecu i valent to and y^^ = k 2 y 2 for both 

tests ; conversely, for a given k 2 and associated ), 

there exists a Cg such that (aj^ ,Cj^ ) is equivalent to 
and y^ — ^272 is maintained . The respective 
k 2 and Cg are given by the following relation I 


'B 


♦d 1 
CCr^H^OCD 7\ 

‘'2 > 


(3.5.8) 


(b) there exists a k^ given by 


ko = - 


e 


(■ 


cxpR-j ttp i 
1 roD ;?r 

*B«2«e 


(3.5.9) 


where 
such that 


2b = 


(3.5.10) 


a, = ffp and we obtain eauivalent test by 


choosing Cr, = c, 

2 

iiosl : We see that the value of ff for which the quantity 
within braces in (3.5.7) is a maximum satisfies 

®1 -1 

a,, = (R, - IT Ro) ^ 

~k2 ' 1 1 C 2 2'' 


(3.5.11) 



112 


where 


■9l = ( 


(3.5.12) 


which follows by usual calculus procedures o 

(a) Suppose , we are given k2 . Then ,Cj^ ) is 
known, and the corresponding two types of errors ale given 

^l(k2) ^2(k2) follows : 

^l(k2) = V2(k2) 


where 


2(k^) = 


^2 


^2k2'*2&2)^ 


(3.5.13) 


Similarly, the two errors denoted by e^^g and e2g due to 
where Cg is yet to be specified, are given via 
Yj^B) 72(8) ^®^P®c‘*^ively vi^ich are as follows : 


^1(B)= 


“ — ^1 


^B 



^2(B) 1 



Set 

^l(B) = yi(k2) 

and 

^2(3) = '^2(k2) 


k„ a! n 


_ _ ^ ^ 
''2‘ak/lSk2)^ 


(3.5.14) 



113 


'B 




and 


* ^2kA?k 

^ ^ JL, 


(5.5.14) and (3.5.15) implies (3.5.8). 
(b) Set = 9 


(3.5 


=> 9^ = k29 


which when put in (3.5.12) gives (3.5.9). 


3.5.1 AN ILLUSTRATIVE EXAMPLE 


Example 3,5.1 t Let us consider the Example 3.3.1 again. 

(a) Let k 2 = 2.0. Then we find and consequently 
^l(k ) ^2(k ) follows : 

^l(k^) ^ 2.0913 , y2(k2) ^ 1*^456 

Next we find , and then Cp using (3.5.8) . Thus 


B 


= 1.0454. 


and 72(8) given by 

^1(8)"^ ’ ^2(B) 

Now suppose Cg is given and Cg = 2.5. 

Then y^^g^ = 2.5995, 72(8) = 0.7936 
and using (3 .5.8) is given by 

k^ = 3 .2755. 

With this k 2 we find which gives 7i(ij^) a^d Y 2 {y:^) * 
>'l(k,) = •, y2(k , = .7941 


.15) 



114 


(b) The value of for which ^ is 1.1925. 

Remark 5>5*1 t It is interesting to note that the above 
theorem relates our criterion of maximizing the Bhattacharyya 
distance with the minimization of total probability of 
misclassif ication subject to a linear relationship between 
the two types of errors. Thus the procedure based on the 
maximization of the Bhattacharyya distance is justified. 

3.6 METHOD OF OBTAINING a IN THE CASE OF LARGE SAMPLE 

FOR STATIONARY TIME SERIES 

3.6.1 EXPRESSION FOR THE OPTIMAL a 

«s> 

First we have listed the assumptions based on which 
attempts have been made to give an explicit expression 
for the optimal vector a in the sense of maximizing the 
Bhattacharyya distance asymptotically. We made the following 
assumptions ! 

Al . The n-dimensional vector x = ^x;.o) » . . . »x(n-l ) ) is 

«s> 

covariance stationary normal time series with mean and 
Covariance matrix Rj = ((rjCs-t),s,t = 0,...,n— 1)) under 
hypothesis Hj(j = 1,2). 

A2. The spectral densities f of the process under 
the hypotheses are positive on [-TtjTr]. 



115 


A3. The sequence of mean difference 6(t) satisfies 

(i ) sup 16(t)| < » and 

t 

A , n-l-I T I 

(ii) = n 5(t+lT!)5(t) 

‘ t=o 

has a limit given by 

|(t) = lim 5 ^ (3.6.1) 

fl -♦ oo 71 

where is a monotone non-decreasing function uniquely defined 

by the conditions M(-Tr) = 0 and continuity from the right 
(see [59]) and ^(0) > 0. 

A4. The covariance sequence ^jC^) satisfies 

L Itl^"**^ lr:.(t)| <" (3.6.2) 

t = -OO J 

for j = 1,2 and some P, 0 < p <1, 

Remark 3.6,1 \ (a) It may be noted that if we take 6(t) 

as a stochastic process which is ergodic in autocorrelation 
([^3])» then M(X) is nothing but its spectral distribution 
function. 

(b) (3.6.2) implies that 

If j(Xi)-f j(X2)UC2 I ^ (3.6.2) 

where C 2 is some fixed constant (see [32]) ♦ (3»b.2]f is a 

sufficient condition (see [5]) for the Fourier transform 

of {rUt)}*” to converge uniformly to fj(X) in [- 71 , 71 ], 

J t=o 



116 


Under the assumptions stated above the following 
theorem is a logical continuation of Remark 3..3.11 . 

Theorem 3,6,1 I Suppose the assumptions A1 to A4 are 
satisfied. If f qCX) , defined by fg(X) = f (X.).- Sf^CX) 
for Xe[-n,7T], is positive, then 


lim { - ^ lnp 2 (l ,2;yg)} 

n oo 


- 1 r "" ■i 2 /r ^ fi(X)-tf 2 C>^) dM(X) 

4 ^ TT /in ^ X'2/^ \ 2Tr J 


-n 2Tlfg(X) 


^ fgCX) 


(3.6.3) 


where Ya is defined in Remark 3.3.10 . We need the following 

lemmas to prove the theorem , 

A 

Lemma 3.6.1 I Assume Rq = Rj^-0R2is positive definite *, 
this entails fg (X) > 0 on [-71, tt]. Then 




n - <» 

Proof * See Appendix E. 


^ . , TT f.(x) dM(X) 

1 X ‘fi-lR .n”iA = / tt:! — (j = 1,2) 


Lgmji.^ 3 ., 6, ,2 : lim n - ~ 

XI ^ <30 ^ — TiIq 




271 


Proof I See Appendix E. 

Proof of the Theorem 3.6.1 I We have from (3.3.46), 


iafa2)^ 


.... i 

P 2 ( 1 ' 2 :Yb)= — r ^’‘P! - 4 .2.„2 ‘ 




“IS 



where Oj - ^*^9^ ~ 1|2). 

Thus, 

- i lnP2(1.2:yg) = - is inof - ^ ln„2 + ^ 


■“Ic \ 2 


«2+„2 


. i . j. ...ilfl^, 1 

6 'R"^6 

2 -2 If 2, 2. 

- In -i- 1 in 2 . 1 T^r .^V^l V 

- - 4n n - 4n n ^ 2n n 


} 


6'R”^6 

<ro 07 <0 


(3.6.4) 


The first three terms in (3.6,4) converge to zero as n 
by Lemma 3.6.2 . The last term in (5.6.4) converges to the 
indicated value by Lemma 3.6.1 and Lemma 3.6.2. Hence 
the Theorem is proved. 

Define 


G(9) 



TT 

7T ^9 




[ 



f^ (X)-tf2^^^ 

fe"(x) 


dM(X) 

2 ti J 

(3.6.5) 


The following theorem characterizes the value of 9 for 
which -1062(1 ,2;yQ ) has a maximum for large sample . 



118 


T^Pftrem 3.6.2 : The function G(e) defined in (5.6.5) has 
a global maximum at 9 = -1. 

Progf. : It follows immediately from the Cauchy -Schwarz 
inequality. In fact, 





dM(X) .2 ^ 
271 ^ 


1 

71 (f,(X)+f (a ))2 1 

r 


dMC-X) 


71 fj^(X)4f^(X) dM(X) rr 1 dM(x; 

f^(X') ^ 271 

0 

equality occurs when 0 = -1. 

Thus the required optimal is given by 

a, = (Ri+R2)'\ (3-6-6) 

Conseouently , our desired linear discriminant function is 

y* = ^'(Ri + (3.6.7) 

Remark 5 .6.2 I The implication of our asymptotic form of a 
in (3.6,6) is as follows I 

V n > n i.e. for all n’from a certain stage onwards, 

— o 

~ (Rj^+R 2)'"'^^ would maximize -lnp2(l » 2 ;o^ x). 

It naturally needs (in the light of the discussions in 

A . , 

Section 3 « 3) to be demonstrated that 0jj given by 



119 




2>^* 


(3.6,8) 


— ajpj — ■ "■ ■' V o • u » o 

converges to 1 as n in fact, (3.6.8) can be put in the 

following form (after plugging in the value of a ) 

1 1 2/n 

1 + ^ — 

fe' (Ri«2)-'R2(Ri+R2)-'6 i 6J (Ri*R2)-i6 


which converges to 1, by Lemma 5 .6.1 and Legima 3.6.2 . 

Remark 3.6.3 \ alone cannot specify a test. The value of c of a 

linear procedure is to be known in order to obtain a complete 
description of the test. This can be done in the following 
ways. 

1) If we specify one kind of error Uay» e^^), then c 

is automatically known. One observation can be made in this 

connection. Suppose ej^ is given or eaui valently yj|^. Then 

1 

^ 

e„ = 1 -» ( 5 ) 


(a' ^ 0 ® 

CV * Z ♦ 


-1. v2 


1 ~ $( 


i (Ri^^ 2)’^^ “ yiCiXRi+R2)‘\(Ri^^^2)^^ ) 

1 

(6' (R^+R2)‘S(Ri«2) ’^^’ 



120 


I 6 '( R ,+ R .)“^6 , 

= 1 _ ^ ~ (Ri^o)“^i) 


1 

•“1 s \ 5 




-U ^5 


■) 


(n -*<») 


-> 0 f by Lemma 3«6.1 and Lemma 3.6.2. 


2) One can specify c in such a way that the total 
error of misclassif ication is a minimum. 

3) Each procedure is evaluated in terms of the two 
probabilities of misclassif ication. One procedure is better 
than another if each probability of misclassif ication of 
the former is not greater than the corresponding one of the 
latter and atleast one is less. A procedure is adm, l. s.§ib2j. 
if there is no other procedure which is better. 


The following theorem which characterizes an admissible 
procedure is given in Bahadur and Anderson ([4 ]). 

Theorem 5.6.3 I A linear procedure wdth 


<= = “'til - 
= “'“2 

f or any t ^2 such that Xj^Rj^ + "*2^2 
is admissible. 


(3.6.9) 

(3.6.10) 


is positive definite 



121 


Given x, , x^ such that x, R, + xR ■^cr^^ 

1 2 1^1 ^ ^2^2 P*°*» one would 

compute the optimal a satisfying (x^^R^^ + T^R 2 )a = 6 and 

then compute c as given in (3.6.10^. Usually, x and x 

1 2 

are not given. We may specify them via the maximization of 
the Bhattacharyya distance. For large sample, we can take 


Xj^ = X 2 = 1 and the procedure is 
a = 


(3.6.11) 

(3.6.12) 


Now we observe the following. 

If (e^*,e 2 J are the two probabilities of misclassification 
resulting from the use of the linear procedure defined by 
(3.6.11) and (3.6.12), then by (3.5.27), 


' 1 * 


*(■ 


a ju, -- 

<N> <ru JL ^ 




) 


1 

ie \^y 


= 1 - »((6'(Ri+R2)‘^Rl(Rl«2’ ) 

1 1 
= 1 - » [ (n)^ ( 6 ' (R^+R2)"^R;^(Ri«2)'\ 


^ 0 , by lemma 3 .6.2. 

(n-oo) 


By a similar argument, e 


2 * 


-> 0 as n ■* 



122 


3.6.2 EXAMPLES 

In this section, we give some examples to illustrate 
the theory presented in the previous section. Examples 
are /losen so as to satisfy the assumptions Al to A4 
of Section 3.6.1. 


Let {Z(t), t >, 0} be an autoregressive normal 
process of oicier 2 (abbreviated as AR(2)) i.e. Z(t) 


satisfies 


z(t) = pj^ZCt-l) + + e(t) 


(3.6.13) 


where ie(t), t > 0] is a normal process with 
Ee(t) = 0 

(0 T = 1,2,. 

and Cov(e (t ) ,£ (t+x) )= < 


(0 T = 1,2,... 

and Cov(e (t ) ,e (t+x) )= < 

I 1 X = 0. 

Define Z(t) = X(t) “ 4(t), where EX(t) = L(t). 

It is well known ([l2]) that the process iZ(t),t >01 
is stationary provided < 1 , (i = 1»2), where {1^1 

are the roots of the equation 


0 


(3.6.14) 


A sufficient condition in terms of^Pj^,P 2 stationarity 

of CZ(t),t > Ol is that pj^,p 2 should lie in a triangular 


region 


Pi t P2 < 1 
p2 - Pi < 1 
-1 < Po < 1 


(3.6.15) 


123 


It may be verified that under conditions ( 3.6 15 ) it, . , 

and < 1 (see [lO]). Thus ,2(t). t >0) satlsLs 

the assumption A1 provided (3.6.15) holds. 

The spectral density of ar{ 2) is given by ([lo]) 

1 1 


f(X) = 


271 


ll-p^e-i’' -p^e-2iX|2 


> 0 on [-7t,7i]. 


Hence A2 is satisfied. 
Take 6(t) = cos|t 


t n-l-l-ri 

Then, “ 6 (t+I"f| )6(t) 

t=o 


(3 .6 .t 6 ) 


^ n-l-|T| 

- n . ^ { cos 5(t+l‘<^l )cos 5t } 

t=o ^ 2 

^ n-l-|Ti 

" 27r ^ cos |(2t+|Tl ) + cos |(|TI ) } 

if/ t Tt n-l-lTi 

~ cos^(l'Ci) + z cos 5 (2t+iti)] 

t=o 

= (n-ITi ) cos |lT 1+ 1 

t=c 

^ ^-l,(t + 

1 r/ V IT 1 n-l-l'fl 

= ^[(n-|T|)cfis5 |T| +le^’'~C r cosit) 

^ 2 t=0 

1 n-l-ITi 

+ ® 2 ( cosirt) 3 



124 


n-l-l T I 


^ (1- cos I I T I + ^ ^ 


(-!)■ 


n 

n-l-i T 1 


+ I (i^ 


(-1)^ 


n 


■)] 


n-l-.| X 1 


This implies, lim ^ E 6(t+l t 1 )6(t) = i cos 2. . 

n-t oo t=o ^ 2 * ‘ ' 

Now, take M(X) to be a step functScn having jumps at + ^ of 


height 

Tt 

2* 



Then 


) ^ . 1 1 ^ 

-Tt c. ^ 


Thus , 


n-l-l T 1 71 -w 

lim ^ E 6(t+l T 1 )6(t) = / 

n -» <» t — 0 —71 

mix) 

271 

Hence , 

A3 is 

satisfied . 



Let r(h) = EZ(t)Z(t+h). 

Then it can be shown that r(h) satisfies the difference 
equation 

r(h) = (3.6.17) 

the solution of which is given by ([lO]), 


r(h) 




(3,6.18) 


(h = 0,1,2,..) 


where Ij’s are the roots of (3»6.14), 


Write r(h) = ’ 


.h+l 


where 


(1-4) 


^)(1 ‘’■^ 1 ^ 2 ) 


» ®2 “ 




(5 2~^ 2 ) ^ 2 ) 



Then 


Z I tl ^*'‘^1 r(t)| 
t=o 

OO . - QQ 

<laj^l( E lt|l P ♦ la |( E lt|l*P||,|t+l) 

t-o t=0 ^ 

Letting = t^"^^ 1^ j, I , we have 

liffi = lim ( 1 + II ,| 

t-*oo^t t -* °° ^ ^ 

= 15^1 < 1 

Thus by D'Alembert’s ratio test, E I t! ^"*'^1 |, I 

t ^ 

Similarly, we have, E itl^"^ < “• 

Hence A4 is satisfied. 

Example 3 ,6,1 : Let 

: Z(t) = 0.5 Z(t-l) -h 0. 3Z(t-2) + e(t) 

: Z(t) = 0.5 Z(t-l) - 0.3 Z(t-2) + e(t) 

Let 

EH^X(t) =0, X(t) = 6(t) = cos I t 

The first row of is given by (for n = 25) » 

(1 .0 ,0.7143 ,0 . 6571 ,0 . 5428 ,0 .4685 »0 .3 9712 ,0 .3 3 912 ,0 .28870 ,0 .2460 
0.20965,0.17865,0.15222 ,0.12970,0.11052,0.09417,0.08024 ,0.068 37 
0 .05825,0 .04964 ,0 .04229 ,0 .036041 ,0 .03071 ,0 .026167 ,0 .022291 , 
0.01899) . 



126 


The first row of R 2 is given by (for n = 25); 

(1 .0 ,0 .3 85 ,-0 . 10 8 , ~0 . 1 6 95 , 0 32 3 ,0 .02467 ,0 .02804 , 

0.00661, -0.0051, -0.004537, -0.000737, 0.00099,0.00072, 

0 .0000 , -0 .0002 ,-0 .00011 ,0 .0000 ,0 .0000 ,0 .0000 ,0 .0000 , 

0 ,0000 ,0 .0000 ,0 .0000 ,0 .0000 ,0 .0000 ) . 

We have solved the implicit equation (3.3.6) for n = 10 ,11 , . . . ,25. 
The pertinent results are shown in Table 3.18 . It is clear 
from the table that approximating the solution of (3.3.6) by 
a = (R, •hRo)*'^6 becomes more and more accurate as n becomes 

^ X Z' 

larger and larger. See also Fig. 3.23 

Similar conclusion can be drawn for the examples that follow. 

Example 3 06. 2 I Here we change only R 2 , viz., 

; Z(t) = 0.5 Z(t~l) + 0.3 Z(t-2) + e(t) 

H2 : Z(t) = 0.2 z(t-l) - 0.5 Z(t- 2 ) + e(t) 

The first row of R 2 is given by (for n = 20) t 

(1.0,0.1333,-0.475 3,-0.27666 , 0.00366 , 0.08485,0.04131, 

-0 .00479,-0 .01479 ,-0 .00595 ,0 .00145 ,0 .002516 ,0 .00082 ,-0 ,0003 4 , 

-0 .000418,-0 .000106 , 0.0000 ,0 .0000 ,0 .0000 ,0.0000 ) . 

The results are shown in Table 3.19. See also Fig. 3*24. 

In the following examples we omit tables as well as 
covariance matrices for them. We give only pertinent 
infoimation through figures. 



127 


^xainpl s 3 •6*5 • Let 

= .5, ^2 = *3 

^2 • “ -*5.32 = “-3 

and 6 (t ) = cos ^ t 

See Fig . 3 •25. 

Example 3 >6. >4 : Let and H 2 be the same as in Example 3.6.3. 
and 6(t) = (2)^^^cos ^ t . 

See Fig. 3 .26. 

Remark 3.6.4 ! The convergence of 9 to -1 is more rapid 
in Example 3.6.4 than in Example 3.6.3. 

In what follows we consider examples based on covariance 
stationary AR(l) processes defined by 

Z(t+1) = PZ(t) + e(t+l),for IPl < 1, t = 0,1,2,.. . 
It can be easily verified that it satisfies the assumptions 
Al to A4. 

Example 3.6.5 I Let 

I P = .2f and H 2 I P = .5 
6 (t ) = sin ^ « 

See Fig. 3.27* 

Example 3.6.6 : Let 

: p =-.5 ; H 2 : p = .8 

6(t) = sin ^ t 

See Fig. 3 •28« 

g2LMPl6 3t^t7 : Let i 

: P = -.5 ; H 2 : p = .8, 6(t) =( 2 ) sm 

See Fig. 3.29. 





Table 3.18 


128 


Computations in Example 3.6.1 


The number of 

obseri-ation 

(n) 

The value of 0 for 
which the involved 
iteration ends 

The number of 

iteration 

required 

The initial value 
of a for each 
iteration for 
all n 

10 

-0.4684516 

4 


11 

-0.4872228 

4 


12 

-0,5169756 

4 


13 

-0.5319041 

4 


14 

-0 . 5567487 

4 


15 

-0.5695292 

4 


16 

-0 . 5904840 

4 

1 

17 

-0.6013405 

3 

2o = i 

18 

-0.6192849 

3 


19 

-0 . 6284942 

3 


20 

-0.6431151 

3 


21 

-0.6519033 

3 


22 

-0.6648376 

3 


23 

-0.6722167 

3 


24 

-0.6837141 

3 


25 

-0.6906160 

3 



129 


Table 3.19 

Computi^tions in Example 3.6.2 


The number of The value of 0 for 

observation which the involved 

(n) iteration ends 


The number of The initiil value 
iteration of a for each 

required iteration for 

all n 


2 

0.0403760 

3 

-0.0760228 

4 

-0,0810608 

5 

-0.0961092 

6 

-0.1168525 

7 

— 0 . 148649 9 

8 

-0.1678503 

9 

-0.2012634 

10 

-0.2148456 

11 

-0.2407287 

12 

-0.2532185 

13 

-0.2773068 

14 

-0.2882282 

15 

-0.3103702 

16 

-0.3198107 

17 

-0.3398194 

18 

-0,3482509 

19 

-0.3667486 

20 

-0.3742574 


6 

2 

4 

4 

4 

5 
5 
5 
5 
5 

5 = (Ri+R2^’'^^ 

4 

4 

4 

4 

4 

4 

4 

4 




Fig. 3.24 Example 3.6.2 1 AR (2) ) . 




Fig.3.24Example3.6.2 !AR{ 2 )) . 





134 





137 


3.7 CONCLUSION 

We see in this chapter that our criterion of maximizing 
the Bhattacharyya distance for optimal LDF leads to solving 
an implicit equation for a discrete time series. By comparison 
with other LDft and QDF , we note that the distance of our 
interest is worth-considering. In the case of stationary 
time series, we observe that when no explicit analytical 
expression is available for optimal LDFs for the criteria 
considered so far in the literature, the maximization of 
the Bhattacharyya distance does give one for large sample. 



CHAPTER IV 


linear discriminant functions for CONTINUOUS-TIME SERIES 

4.1 INTRODUCTION 

In Chapter III » the process {x(t), teT } of our interest 
was discrete in time. Now, we assume the parameter set T is an 
interval of the real line, finite or infinite. The basic tool 
in dealing mth the continuous-time parameter process is the 
sampling which converts a continuous-time parameter process 
to one discrete in time. The problem is attempted, when the 
process is not stationary, via a series representation of 
{x(t), teT } . This is done in Section 4.2 where it is shown 
that our criterion of maximizing the Bhattacharyya distance 
yields an integral equation of Fredholm type to be solved. 
Section 4.3 contains an explicit expression for the linear 
discriminant function in the case when the process is covariance 
stationary *, we have used the Shannon’s sampling theorem* to 
deal with the above case. 

4.2 SECOND ORDER TIME SERIES 

The observed process {x(t), teT} is assumed to be 
continuous in time. Let T = [o,A] , where A is a real number 
(finite). It is also assumed to be a second order process i.e. 
Ex2(t) < « V t e [o,A]. 



139 


Let ^ X(t) = )LLj(t) 


(4.2.1) 


CoVp^^(X(t),X(u)) = Rj(t,u) 
where j = 1,2, t, u e [OjAj. 


(4.2.2) 


Assume , I6(t)| - 1/J,j^(t) - /^^(■t)! < “ in [o,A]. 

Our first step is to reduce the process Cx(t), teT ] to 
a set of random variables (possibly a countably infinite set) ([63 
This is achieved by the method of the series expansion I 


x(t) = Z x^ cp^(t) 


n=l 

in the following sense I 


K 


(4.2.3) 


lim E (X(t ) - I x^ g) _ (t ^ V' = 0 , 0 < t < A 

„ .. n n 

j<- ^ oo 


w'here 


A 

x^ = / x(t) dt 

o 


(4.2.4) 


and {^j^(t)} is a set of complete orthonorrr.al functions in 
the interval [o,AJ with 


A 


f 1 n = m 


/ <P„(t) dt = 6^ =■/ 

i^c n ^ m 


( see Appendix F ) , 

Let a(t) e L^(o,A) i.e. a(t) is a sauare integrable 
function on [o,A]. Then the Fourier series of o:(t) is 


a(t) = S 

n=l “ 


(4.2.5) 


a = I a(t) <P^(t) dt 


where 


(4.2.6) 



140 


Consider K terms in the series (4.2c3), having the 
coefficients (xj_ , . . ,Xj^) . Let x = (Xj^ , . . . . ,Xj^) . 

Then x ~ , R^), under (j = 1,2), 


where M . . . ,4 

Rj = ((CoVh^(X. ,XO))i 
Follovvlng the discussions in the 
classification rule would have been I 


j X 1 1 • • 

previous Chapter , 


our 


But 


K 


H, 




(accept) 

K K A 

lim L a^x = lim £ ( / cp^\.t} dt) x„ 

K-«> n=l K-oo n=l o ^ ^ n 


A K 

= lim / a(t)( 

K-*.“ o 


n=l 


A 

= / a(t) x(t) dt, 
o 

the limit being in ouadratic mean. 


Hence our linear procedure in the case when the process 
is continuous in time is given by 

A ^1 

y = / a(t) x(t) dt ^ c (4.2.7) 

° «2 

(accept) 

(For definition of Stochastic Integral see Appendix I). 



141 


Thus our problem is now to find an optimal a(t), te[o,A] 
in the sense of maximizing -In p 2 (l, 2 ;y), y is defined in 
(4.2.7). 

The following theorem states the method of obtaining a(t). 


Thporem 4.2.1 * The a(t), te[o,AJ, for which -In P2(lj2jy) is 
a maximum satisfies a Fredholm integral equation of the first 
kind 

A 

I R-(t,u) a(t) dt = (5(u), (4.2.8) 

o ^ 

to be solved iteratively, where p is aiven by 


A 

; a(t)6(t)dt ^ ^ 2 i 

^ ° A A ^ ^ "a a A~A ' 

/ /R(t ,u )a (t )a (u )dtdu / / R 2 (t ,u)a(t)a(u)dtdu / /R(t,u)a(t)a(u )dt( 

O O 0 0 0 0 



A 

/ a(t)5(t)dt ^ 2 

r o , 2 , 

^ A A i A A A A 

/ S R(t ,u)a (t )a(u)dtdu / / R, (t ,u )a(t )a (u )dtdu I /R(t,u)a(t )a(u )d1 
0 0 0 0 0 0 


(4.2.9) 


and Rg(t,u) = Rj^(t,u) -©R 2 (t,u) , 

A 

R(t,u) = Rj^(t,u) + R 2 (t,u). 

The iteration continues until 1 ^ ^ * where e 

is a pre-assigned number, i denotes the number of iteration. 



142 


proof, 
into a 

where 

Let 

where 

Thus 

since 

Now R 


I For a given .jui the function Rj^(t,u) can be expanded 
Fourier series in the interval [o,A]: 


R (t,u) = £ (4.2.10) 

1 n=l ^ 

P^^(u) = /\(t,u) <P^(t) dt (4.2.11) 


R 2 ^(n,in) = Cov(Xj^,Xjj^) (n,in - 1,...»K) 

A 

= / x(t) (p^(t)dt. 

= /(x(t) - cPj^(t) dt, 

U, (t) = I ^ 

^ n=l ° 


(n,m) 
lj^(n ,in) 


can be written as 

= ■ '"in' 

= E ! / (xCt) - Uj^(t)) 

o 


/(x(t)-n^(t)X"^(t)d; 


= //R 2 ^(t,u) <Pjj(t)<f>jji(u)dt du 


= /( / R^(t,u) <p„(t)dt) %(u) du 
0 0 

o 


(4.2.12) 


by (4.2.11)* 


143 


If we consider the finite set of random variables (x^ , . . . ,Xj^) , 

then the a vector of the linear procedure satisfies the system 
of equations 


R a = 6 

CF ro 

a* 6 


(4.2.13) 




where -© = 


a« CRl+R^)a ^«R^g - 




}2 ^ 


a* (Rj^+R 2 )a 


(cf. (3.3.6) and (3.3.7) , 


a? R, a 


a ’(R,+R«)a 


(4.2.14) 


and 


Rq - Rj^ - ©R2* 


The nth equation of (4.2.13) is I 
K 

£ RQ(n,m) = 6 


m=l 


m n 


which after being multiplied on both sides by ^^^^(u) and then 

summing over n (n = 1,...,K) yields 

K K K 

L £ R©(n,m) oCjjj <p (u) = £ 6j^<p^(u) 

n=l m=l n=l 


Let K-^oo , Then we have, 


oo oo 


£ £ Rg(n,m) 9j^(u) = 6(u) 

n=l m=l 


OO OO 

=> £ £ Rg(m,n) ^ 

n=l Hi=l 

00 A t f 

==> £ R-(n,m)( / a(t) (P^(t)dt) ^^(u) = 6(u) 

n ,m=l ^ o 

==> /^( £ Rp(n,m) <Pjj(t) <Pnj(u))a(t)dt = 6(u), 


n ,m 



144 


(since 16 (u)l <~ in [o,A], / and L can be Interchanged), 
which can also be written as 



(4.2.15) 


The first term on the left hand side of (4,2.15) can 

be written as 
A 

/ ( Z R, (n,in) (t) (u))a(t)dt 
o n ,in 

A A 

= / { Z C / ^in(^)<Pjj^(v)dv) (p it) <p vu)} a(t)dt , (by 4.2.12) 
o n ,m o »* lii 

A A CO 

= /[£{/( Z &i^(v) <P„(t))g> (v)dv] ]a(t)dt(p (u) 
o m o n=l ® 

A oo A 

= J { Z ( / R, (t,v) V (v)dv) <P^(u) 3 a(t)dt , (by 4.2.10) 

0 m=l o 

A oo 

= liz p, (t) <P (u))a(t)dt , (by 4.2.11) 
o m=l 

A 

= / R, (u,t)a(t)dt , (by 4.2.10) 
o 

A 

= / R, (t,u)a(t)dt (4.2.16) 

o 

It is to be noted that what we have actually shown above is 

OO 

£ Rj^(n,m) ^^^(t) cp^(u) = R^(t,u) (4.2.17) 

n ,m=l 



145 


By a similar argument, the second term on the left hand side 
of (4,2.15) can be put in the form 


A 

B / R 2 (t ,u)a(t)dt (4.2.18) 

o 

Using (4.2.16) and (4.2.18), (4.2,15) reduces to (4. 2. 8)1 


A 

/ Rg(t ,u)a(t)dt = 6(u) 
o 

where 9 is given by (using 4.2.14), 


K 

S a 6 


K 


2 

} + 


0= lim 


S R(n,m)a:„a 
n .m n ,m 




K 

I 

n .m 


K-» ~ K 


£ a 6 

^ T/* 




K ^ ’ K K 

E R(n,m)a_a £ ^ 

n,m=l n,m ^ “ n,m 


= right hand side of (4.2.9), 

which follows from the following two facts t 
K K A 

1) lim £ CCn^n ~ ^ ( / a(t) (f^it)dt)5^ 

K-* «“ n=l K* «» n=l o 

= £ ( / a(t) <pj^(t)dt)6j^ 

n=l o 

= /( £ 6^ <Pj^(t))a(t)dt 

o n=l 
A 

= / a(t)6(t)dt. 

o 



146 


K 


2) (i) lim a'R;^ « = lim S R, (n.m) a a 

K- ~ K -con,m=l ^ 


■ffi 


RlCn»ro)(/a(t)(r^(t)dt)(/ a(u)<J>^(u)du) 


A A 0^ 

S S i Z ^i(n,ro)ffi (t)f (u))a(t)a(u)dtdu 
o o n,m=l II m ' 


A A 

/ I R^ (t ,u)a(t)a(u)dtdu, 


0 o 


using (4.2.17). 


(ii) By similar arguments, 

A A 


lim R 2 a = S J R^Ct ,u)a(t )a(u)dtdu. 


0 o 


Hence the theorem is proved. 

Remark 4.2.1 * It may happen that the integral equation (4.2.8), 
for a given 9 , does not have a solution. This can be seen as 
follows ([62]). To this end the following theorem may be 


useful . 


Theorem ( Hilbert -Schmidt theorem) 


If 6 (u ) can be written in the form 


6(u) = / RQ(t ,u)a(t)dt 


(4.2.19) 


A A 


where Rg(t,u) is a symmetric L 2 -kernel (i*e. I J RQ(t,ii)dtdy < 


o o 


and 


F^(t,u) = Rg(u,t)) and / a^(t)dt <« , then we can write 


6(u) = S au ?h(u), 
h=l " 


(4.2.20) 



where 


A 

= / 6(u) f^(u) du (h = 1,2,...) (4.2.21) 

o 

and {’?^(x)} is the orthonormal system of eigenfunctions of 

RJt,u) satisfying 

B 

A f.(t) 

J R Jt,u) l'h(u) du = — » (4.2.22) 

Aj^’s are called the eigenvalues of the kernel. 


Let 


a(t) = S c, (t) , 
h=l ^ " 


where 


Cj^ = / a(t) 
o 


Then from (4.2.19), we get 

A 

6(u) = J Rit,u)( Z Cj^ fj^(t)) dt 


h 


= Z ( / Rq(t,u) ?y^(t)dt) 


h o 


OO Cl 

= s ^?r) 

h=l '"h 


using (4.2.22). 

Thus, a^ = “ 1»2,..*)» 


(4.2.23) 

(4.2.24) 



148 


Hence (4.2.23) can be rewritten as 

^ hll (4.2.25) 

A 

which is the solution of our equation, provided J a^(t)dt <» , 

which is the same as saying that ° 

CO ^ ^ 

®h ^h < “ • (4.2.26) 

If the infinite series in (4.2.26) diverges, then our 
eauation does n_ot have a solution in L 2 -class. 

4.3 WHEN THE PROCESS IS STATIONARY 

We divide the analysis into two cases depending on the 
nature of the parameter set of the process {x(t), teT} . 

Case 1 When T is a finite interval [o,A]. 

We shall discuss two types of kernels in which straightforward 
procedures for solving (4.2.8) are available for a given 8. 

Type A (Rational Kernels) 

Let R^(t,u) = Rg(t-u) , t,ue[o,Aj. 

Let the Fourier transform of this be 

CO . A N(ci)2) 

S (O)) = / e^^"^ RQ(T)dT= (4.3.1) 

This is the ratio of two polynomials in The kernels 

whose transforms satisfy (4.3.1) are called iafion^ ,X ks^^el^ ,,. 

The basic technioue is to find a differential equation 
corresponding to the integral equation. Because of the form 



149 


of the kernel, this will be a differential equation with 
constant coefficients whose solution can be readily- 
obtained. 

Let be the Dirac delta function, so that 

r a^Ct) e^“^ dt = 1. 

(see , Appendix G) 


=> a^Ct-u) = / 

^ —oo 

Differentiating (Appendix G) 




d'W 


( 4 . 3 . 2 ) 


this with respect to t gives 


pa Q(t-u) 




do) 

2ti 


( 4 . 3 . 3 ) 


where 


D - SJ- 
P - dt 


More generally, 

N(-p2)6j3(t-u) = /*N(w^) (4.3.4) 

In an analogous fashion, 

D(-p2)Re(t-u) = /%C.2)S9(.) f, 

= rN(y2) ^ (4.3.5) 

^ CO 

Comparing (4.3.4) and (4.3.5), we get 

N(-p^)aQ(t-u) = D(-p^)Rg(t-u) (4.3.6) 

Operating on both sides of (4.2.8) with D(— p ) , we obtain, 

D(-p2)6(t) = / D(-p^)R-(t-u)a(u)du 

o 



150 


2 

= / N(~P ) dpCt-u) a(u)du, (us3ng 4.^,6) 
o 

= N(-p^) a(t),0£t£A 
i.e. the differential eqn. of our interest is 

D(-p^)6(t) = N(-p^) a(t) (4.3.7) 

Some specific examples are given in VanTrees ([63]) 
for illustration. 

The following simple example illustrates the technique. 
Example 4.3.1 . 

Take Rj_(t-u) = R 2 (t-u) = R 2 ('c) , (- “ < t < “ ) 

= ' 

or ^ . 

o 

Therefore! N(w ) = 4 
0 ( 0 ) 2 ) ^ 

and the differential equation (4.3.7) is 

_ d£^t l + 5(t) = 4a(t) , n < t < A 
dt^ 

whence a(t) can be readily obtainea. 

Type B . Let us consider the problem, 
x(t) = fuAt) + n,(t) under 

^ i. 

lM. 2 (t) + n 2 (t) , under H 2 , 

Suppose nj^(t) contains only the white noise, 

i.e. nj^(t) nj_(u) = 6p(tHj). 


te[o ,A] . 



151 


Then (4.2.8) reduces to 

A A 

/ 6jj(t-u)a(t)dt - 0/ R^(t,u)a(t)dt = 6(u) 
o o 

A 

or, aCu) - 0/ Rp(t,u)a(t)dt = d(u) (4.3.8) 

o 

which is known as the Fredholm integral equation of the 
second kind. 


Assume R 2 (t,u) is a L 2 kernel and / R^f (t,u)du < «>. 

We note that a(u) - 6(u) has an integral representation 
of the form (4.2.19). Hence by Hilbert-Schmidt theorem, 
it can be represented by an absolutely and uniformly convergent 
series of the type t 


where 


and 


with 


a(u) - 6(u) = S 
h=l 

CVJ 

<pj^(u)*s are given by 


/ R2(t ,u) ^y^(u)du = 



(4.3.9) 


(4.3.10) 


dj^ = / (a(u) - 6 (u) ) ^ j^(u) du 

o 

= ®h ” ^ h 

A A 

^ 6(u) Vtj(u)du, = / a(u) ^^(u) du 

(4.3.11) 

Pi 

d. = 9 / ( / R 2 (t »u)a(t)dt) f jj(u)du 

“00 


But 



152 


A A 

= 9J ( / R 2 (t,u) <Pu(u)du)a(t )cit 
0 0 

(since < « and / R 2 (t,u) fj^(u)du < «, interchanging the 
order of integration is permitted) 

= 01 “T a(t)dt , (by 4.2.35) 

o h 


= 0--iL 


. (by 4.2.36^ 


(4.3.12) 


®h “ 


==^ -Sr) 


V 


V. 


> eu = 


” £ 

h 9 *^h 


Using (4.S.13), <3,, = 9 5 h ^ 


(4.3.13) 


9 


V® 


(4.3.14) 


Hence from (4.3.9) » 

09 ^ h ej 

a(u) = 6(u) + v^-Tq 


h=l h 


09 A ^L,(t) ‘Pu(u) 

= 6(u) + 9 £ / -6(t) dt 

h=l o h ” 


(4.3.15) 


provided 9 is not an eigenvalue of 



153 



: When R^(t,u) = R^Ct.u'i , 

A 

2 / Rj_(t,u)a(t)dt = a(u), 

o 


(4.2.8) reduces to 
C4.3ol6) 


because 9 = -1. 


The eouation (4.3.16) is studied widely in (f63]). Let us 
consider an important special case. Suppose, n^(t) and 
n 2 (t) both are white. 

Since 6Q(t-u) is a p.d. function, by Mercer’s theorem, 

6 (t-u) = J (\(t) ?j^(u)) (4.3.17) 

^ h~l 

where \(t)*s are eigen functions of 6p(t-u). 

Using (4.3.17), from (4.3.15) we obtain, 

a(u) = 6(u) - ^ / (l 9^(t) f^(u)) 6(t)dt 

o 1 

A 

= 6(u) - ^ / 6Q(t-u)6(t)dt 

o 

= 6 (u) - 2 
= 2 » 

which is a well-known result (see [63]). 

Case 2 . When T is an infinite interval [ 0 , 0 °) 

We make the following assumptions analogous to those 

required for discrete— time parameter process . 

Cl. The difference of the mean functions 6(t) satisfies 

i) sup I6(t)l < “* 
t 



154 


ii) lim A ^ 6(t)6(t+T)dt = / , o<t<“. 

A 0 — ?i 

where M(X) is as given in Chapter III. 

C2 . The covariance functions Rj(v) satisfy 

/ lRjj(v)l dv <~, where o<p<l, (j = 1,2). 

o 

C3 . The spectral density of the covariance stationary 
process Ix(t), o<t<~} is zero outside the frequency interval 
- Tr£ X ^ 71 . 

Then we have a theorem , which gives an explicit 
expression for a(t). 

Theorem 4.3.1 1 Assume Cl to C3 of the above are satisfied 
while the underlying process is covariance stationary. 

Then the optimal a(t) is given by 


where 


and 


a(t) = 

(4.3.19) 

(4.3.20) 


D(X) = I 6(u) e^^^ du 


S(X) = / R(v) dv 


We need the following lemmas to prove the theorem. 

4.3.1 . Let IR= (- -.+“)■ Assume that F(x) and 
GCx) are Lebesgue integrable on B and that at least one o 
F and G is continuous and bounded on B. Let 
H(x) = /“F(t)G(x-t) dt. 

CO 

Then for every real X , we have, 



155 


dx = ( /^F(y) dy)( /“gCz) dz) 

(4.3.21) 

Froof I See Apostol ([6], pp. 329-331). 

Lemma 4.3.2 . If for some A (real number), the spectral 
density of a covariance stationary process { X(t), o<t<«> ] 

is zero outside of the freouency interval - a£ X £a , 
then the process can be exactly reconstructed from its values 
at the time points 


^ (k = 0,1,2,...) 

More precisely. 


X(t) = £ 

k =0 


A (t- 


X (2^), (o < t <») 


(4.3.22) 


Proof I This result is known as the Shannon’s sampling theorem, 
the proof of which can be found in Koopmans ([29]). 


Proof of the Theorem 4.3.1 I 


By Lemma 4.3.2, we have 

oo 

x(t) = £ x(n), (4.3.23) 

n=o . 

where <,^Ct) = ' • 

Thus our sampled process is {x(k), k = g,1»2,..}, which 
is also covariance stationary and satisfies all the assumptions 



156 


A1 to A4 laid down in the Section 3,5 of Chapter III. By 
an application of the theory therein, if we consider K' 
terms in the series of (4.3.23), we have 

(Rf ^2^ 5 ^ • 

f 

Now we let K > " , Then by a similar argument as 

used in Section 4.2, we get, 

OO 

I R(t-u) a(t) dt = 6(u) (4.3.24) 

o 


where R(t~u) = Rj^(t-u) + R 2 (t--u). 

Let A(X) = / a(t) e^ d>^ i.e. A(X) is the Fourier 
o 

transform of a(t). 

Then, using lemma 4.3.1, we obtain from (4.3,24), 

S(X) . A(X) = D(X) , (4.3.25) 

where S(X), D(X) are defined in (4.3.19) and (4.3.20). 
(4.3.25) implies A(X) = 


CO 

=*> a(t) = / 

which is the same as (4.3.18) . 


D(X) ^ 

Stxy ® 271 

This completes 


the 


proof . 


4.4 CONCLUSION 

The linear discriminant function is 
A A 

Y = j CL (t ) x(t) dt. 
o 



157 


Our problem was to find an a(t) such that -lnP2(l,2;y) 
given by 

A A 

-lnP^(l,2;y) = iln(/ / a(t)R (t,u)a(u)dtdu) 

0 0 

, A A 

+ 4 ln( / / a(t )R 2 (t ,u)a(u)dtdu) 
o o 

A A 

- ^ ln(^ I / / a(t )R(t ,u)a(u)dtdu 1 ) 


( / a(t)6(t)dt)'^ 

+ 

(/ / a(t)R(t ,u)a(u)dtdu) 


is a maximum . We attempted the problem through the sampling 
technique. We noted that no compact form of a(t) is available 
except the case when our basic process is covariance stationary 
with the parameter set infinite. 


CHAPTER V 
DESIGN OF SIGNALS 


5.1 INTRODUCTION 

This chapter treats the problem of selecting a signal 
embedded in normally distributed additive noise with zero 
mean and possibly unequal covariance matrices. We state the 
problem mathematically in Section 5.2. The various methods 
of obtaining an optimal signal are discussed in Section 5.3 
while Section 5o4 contains some numerical results. 

5.2 MATHEMATICAL STATEMENT OF THE PROBLEM OF SIGNAL SELECTION 
A fairly general mathematical model of the time series 

(discrete) {X(t), teT } can be written as 

X(t) = ja(t) + n(t) , 

where d(t) is a completely deterministic process and n(t) 
is a stochastic process. They are sometimes called *' signal ' ^ 
and '' noise’’ respectively. 

Let X(t) = ' ia(t) + n^(t), under 

, n^Ct ) , under H 2 » 

where nj(t)*s (j = 1.2) are normal processes. 

Let X denote n observations on X(u) > i.e.» 

X = (X(o) ,...,X(n-l)) 

CO 

fl rz (/LX (o ) j ♦ • • jM' (n— 1 ) ) 

n •= (n*(o) j* • } ) 

cj J ^ 


159 


where let nj 'J’^der Hj(j = 1 , 0 ) 

or eouivalently,^ ~ under 

and .X ~ Mj^(o,R 2 ) under H^. 

Our object (from the view point of classification) is to 
minimize the Bayes risk, or, if we attack eoual costs to the two 
types of errors, to minimize the total probability 
of mdsclassif ication. For a given fi , the signal vector, 
this probability will also be a function of d- and one 
naturally asks I which is the signal vector that minimizes 
the total probability of error resulting from the use of 
the Bayes optimal classification rule subject to the restriction 
d’d = 1 ? Unfortunately, it seems difficult to carry out the 
direct minimization involved. Hence we resort to some signal 
selection criteria ([22,26,48]) that may be weakar than the 
error probability but are easier to evaluate and manipulate. 

Our first step is to classify x. into (j = 1,2) optimally 
is some sense. This we do via linear procedures considered 
in earlier chapters. But our criterion is now of minimizing 
the total probability of error resulting from the setting up 
of the logical requirement that each probability of misclassif icai 
is the same when no knowledge is available about the a— priori 
probability of the hypotheses. If we denote the total probability 
of error by Pr(e ) , then our problem is . 


160 


min min Pr(e) 
B a 


(5.2.1) 


5.3 METHODS OF SIGNAL SELECTION 

. IT we apply the classification scheme 
^1 

a' X > c 

<%> rs.- ^ 

H 

2 

(accept) 

then, a s we have seen in Chapter III, the two types of errors result 
which are given by 

a’ u - c 

#SJ ^ 

(a* R, a)^/^ 


= 1 - f (y^^ ) , where y^ 


(5.3.1) 


and 62 = 1 - $( 72 )* where 




(5.3.2) 


Set = ©2’ which is eouivalent to saying y^ - 72 * 
since f (x) is montone in x. 


This implies 


a*p. - c 


|NJ X ^ 

a')a(a*Rpa)^/^ 

(a* + (^' 


( 5 o 3 • 3 ) 


Putting this value of c in the expression for y^ in (5.3.1), 


.ng 
we obtain 



161 


7~ . D « N-l/ 2 


(a.Ria) (a* R ajV2 


(5.3.4 


which is positive ^ since the case < 0 is of no interest, 


because the miniiruin Fr(e) achievable in this case is ^ . 


Now, if is the a-priori probability of (j = 1,2), 


then the total probability of error is given by 


Pr (s ) — ^^1 ^2^2 


since e^ = ©2* 


Since f(y, ) is monotone in y^^ , our problem (5.2.1) can 


be rer ormul ated as 


a' M 


max max 


Si a A UJOA -j 1 r\ 


172 ’ 


(5.3.5; 


u’p = 1 


which can also be written as 


y, = max max 


n rr\l/2 


= 1 


(5.3, 6 j 


This is due to the assumption that a belongs to a n-cell in 


IR , the n-dimensional Euclidean space. 

(a'a)l/2 

Now, y^** = max rjr^ 


a (a'R,a)^'^^+(a'R^a)^/^ 

■o r«j X ro ^ 


(5.3.7; 


since by Cauchy -Schwarz inequality , 


a)(jLLV) » 

<%.> — c>j <*o ' ro <ro' 


162 


where the equality is obtained when ja. = „ 

~ \d.3,8) 

Since and are p.d. matrices, they can be simultaneously 
diagonalized (see Anderson [s], pp. 341). There is a non-slnc 


matrix P such that 


= P‘ A F 

and R2 = P’ P 


(5.2.9) 


where A - di » and ~ ^ 


the roots of 


IR1-XR2I = 0. 


(5.3.10) 


Thus we can reduce (5.3.7) to a simpler form 


(a'P'A Pa)'^'^^+(a’ P’ Pa )^/2 


a’ a = 1 


(p' A 6 )^/^+(p’P)^/^ 


i'Qp = 1 


= min V 
p‘ Q? = 1 


((p'A p)^/^t(p'p)^/^ ), (5o3oi; 


where Q = (PP« )~^ , and p = Pa. 

We can try to solve (5.3.11) by Lagrange s method. 

Let L = (B’ftP)^/^ + (P' + r(p'Qp-l) , where r is the 

Lagrange’s multiplier. 



163 


aL _ 

3P 


= o 


A P 




- — _ + r(2Q8) = o 

i n (8* 8)-^/^ 


==> 




gAf. 


1/2 ^ 7;77;i/2 ^ ^-25’ Qg, = o 


(8'a 8)^/^ (8’P) 


==> 7 = i(C8’A8)^^^+ (8‘B)^/^) . 

Thus the 8 for which (5.3.11) is a minimur satifies 


^ 172 7~ T" ~ i72 = ((B'a 8)^^^+0’8)^/^)Q8 


(g'»p) 


(£'!)■ 


or, Q + T^l/o) = (Ce'Ae)l/2+(^,^jl/2) g 


(8’a 8) ^ (S'e) 


rc 


ppt A PP» P 

^(S'A 8 )^^^ (s' 8 ( (8 'a 6 )^/^+(8 *8 


«%> rsj 


= e , 


(5.3.12 


which one can solve iteratively for 8. 

<v> 

Once we obtain 8 » we can recover a from the relation 


a = P^^S 


Remark 5.3.1 1 One can follov/ the followang steps in order to 
obtain P (See Anderson [3], pp. 339-341 )t 

i) Find an orthogonal matrix C vmlch diagonalizes R 2 > 

ii) Form E = , where the diagonal matrix D 

contains the eigen values of R 2 as its diaoonal elements, 



164 


(iii) Form F = e'Rj^E 

(iv) Find an orthogonal matrix H which diagonalizes F, 

( V ) F o rm EH 

(vi) Then P = (EH)”^. 

Remark 5o3.2 I It seems that (5.?. 12) does not admit of any 
analytical solution. Of course, one can solve it numerically. 
However we do not know whether the iteration involved will 
converge all the time. Even if it does, it is not certain 
that the solution obtained would give a minimum. Let us give 
an example . 





.12 

.15 " 



1.04 e7 

1 

Take 



.15 

.525 


R2 = 

.7 1.25 




r 

1 

.5 


1 

■ .1 0 1 


Then 

p 

— 

1.2 

L 

1 

f 

A = 

0 .5 

• 


A solution of (5.?. 12) is (0 . ,0 . 9543 ) \ but one can 
show this does not give the minimum sought for. 

This difficulty in carrying out tne above procedure leads 
one to search for some signal selection criteria that may be 
weaker than the error probability but are easier to evaluate 
and manipulate. A possible way out is to resort to some bound 
on the total probability of error and minimize that bound. Before 
this issue is taken up in Method 2 . » we want to make some 


remarks . 



165 


Remark 5«3.3 I (i) Let be a scalar macrix i.e. R 2 = v > 

Then from (5.3.7), we have, 


yf* = max 
a 


(a’ 

' ro «v. / 


(a' R,a)^/^+(ra 


(TO CO i(fv 


a»R,a / , 

a (— 1 - 3^ )V2^.(^)l/2 


a’ a 

<*o> rs> 




(5.3.1 


^’R a 

since min = isee Rao ([57j)). 


a a *a 

fsjt esj 


The maximum is attained for the a „ which is an eigen vector 


corresponding to the minimum eigen value of Rj^. 

(ii) Let = ^2 

Then (5.3.5) reduces to 

max max 
U a 

ro «o 

M =1 


f 

a jjL 


2(a»R^a)^/^ 




max 


, (see Rao ([57j ) ) 


= 1 

“ 2 ^^nax^\ 


ii "X ( d"“^ ^ 


the maxim.um being attained for g* = , and is the 



166 


eigen vector corresponding to the minimum eigen value of R^o 
This is a well known result (see [48]). 


Met.h g d 2 * The method is described in the following 
theorem . 


Theorem 5*3 ol I Our signal selection criterion is 

max max - — ^ (j = 1 or 2) (5o3.14) 

iu a (a * R. a 

=1 


where the optimal a and u are given by I 


£» 




and u is an eigen vector correspondina to the X ^„(Ri) or 

x- ^ min 1 ^ 

^min^^2^ according as 




(5.3.15) 


a> jUL 

CO 


Proof : We have from (5.3.4) 

(a’R,a)^/2 + (a‘R a)^/2 

o. Xr^ rw' fo ' 

which we can write as 


^1 = 




a' R,a 

rv ± 


’Po£ 

4rv' 2fO 




a’ja/(a'R^a)^/^ 

ro cc c< 


(5.3.16) 


see Rao ([57] ) . 



167 


Thus our signal selection critericai is 


„ „ A 

B* ♦ = 


a« IS 

*o 


max max 

ii a 

IS'U = 1 


(a* R2&) 


172 


(5.?. 17' 




Now, B£'' = max 


IS 




U'M = 1 

rv fsj 


= (a (R”^ 
''max ^‘^2 


s-1 


The maximum is attained at a ^ = Ro~d^ , where the optimal M 
is denoted by ja„ which is an eioen vector corresponding to 
the 'Consequently, 


yI* > 






In a similar manner we get by interchanging Rj^ and R^ in the 
above , 


y** > 




Thus, ^ max { 


^ ^ax^q U-L. B**(5.3.18) 




max^ 2 I' 


This completes the proof. 

Remark 5.3.4 ! If the proposed criterion is min Pr(e) 

ei=kie2 

then it can be easily shown that the optimum signal is the same as 
that stated in Theorem. 5.3.1. 



168 


5.4. NUMERICAL RESULTS 

In the following two examples, we attempt to demonstrate, 
wherever possible, that our bound works satisfactorily , which 
means that B** in (5.3.18) and y** in (5.3.7) do not differ 
much. The exact maximization in (5.3.7) was carried out 
numerically with one easy-to-use method of Gill and Murray 
( see [ 20 J ) . 


jxMlPl.e. 5t4 , l . Let 

: Z(t) = .2 Z(t-1)+ .7 Z(t-2)+ £(t) 
: z(t) = . 5 Z(t-l)+ .3 ZCt-2)+ e(t) 


where {Z(t), t2o } is as in Example 3.6.1. 

The first row of Rj^ is given by (for n = 10) I 
(1,.666, .833 , .633 , .709, .585, .614, .532, .536, .480) 

The first row of R 2 is given by (for n = 10) : 

(1 , .714, .657, .543, .468, .397, .339, .289, .246, .209), 


ihe computations are shown in Table 5.1 . We note that 


B = max I 










^max^“l “2^^ ^'max^^2 ^1'^ 
Example 5o4.2 . Let and H 2 be the same as in Example 3.6,2 . In 
this case B**= ^ results 


1 / 


are shown in Table 5.2 . 



169 


Table 5*1 Table 5.2 


Computations in Examples 5.4.1 and 5.4.2 


n 



n 


V** 

^1 

2 

.89870 

.89870 

2 

0.68217 

C. 6821 7 

3 

1 .01676 

1.03571 

3 

0.72471 

0.76165 

4 

1 .02888 

1.11896 

A 

0.77158 

0.83835 

5 

1.12974 

1.13848 

5 

0.81398 

0.86866 

6 

1.13499 

1.19251 

6 

0,85997 

0.89182 

7 

1 .15986 

1.19353 

7 

0 086654 

0.91101 

8 

1.20804 

1.22125 

8 

0.87163 

Oo 91201 

9 

1.22292 

1.22492 

9 

0.89673 

0o91986 

10 

1.23346 

1.24716 

10 

0.90164 

0.92227 



170 


5.5 CONCLUSION 

We see that even in the simple case when the two types 
of error are made equal , we do not obtain an explicit expression 
for the optimum alpha vector and the optimum signal. An analytics 
solution for the optimum signal is available only through a bound 
on the total probability of misclassif ication. However, this 
method seems to be reasonably good as a comparison with the 
exact optimum value speaks for this method. 



CHAPTER VI 


LINEAR DISCRIMINANT FUNCTIONS AND DESI® OF SIGNALS 
FOR COMPLEX NORMAL TIME SERIES 

6.1 INTRODUCTION 

So far we have concentrated on the real normal time 
series. Since complex normal processes are of interest in 
many applied areas ([40]), we consider the linear discriminant 
functions for complex normal time series and try to extend 
all our major results from the real to the complex case. 

Though its origin lies in engineering and physical sciences, 
the complex stochastic processes, the complex normal process, 
in particular, have been extensively studied by statisticians 
([58j). In Section 6.2 we formulate the problem of finding 
a classification rule based on linear statistics which 
maximizes the Bhattacharyya distance. Results analogous to 
those in the case of real discrete time series are obtained 
in Section 6.3 whereas Section 6.4 extends the same to a 
continuous time series. The problem of design of signals 
is considered in Section 6.5. Some basic definitions on 
complex normal processes are given in Chapter II. 

6.2 FORMULATION OF THE PROBLEM 

The problem is to classify an n-dimensional observation 
z = (Z(o ) , . . . ,Z(n— 1 ) ) as coming from one o± the two categories 

ro 



172 


specified by two hypotheses and These hypotheses 

state that the nxl complex normal time series 2 has the 

f*a 

following means and covariances under and H2 t 


Ett Z — p. j 
n j •-J 


and 




The density of z is given by (under H-) , 

J 


Pn(z) = 


exp{ -(2 - , (j=l,2) (6.2.1 




We assume Rj to be Hermitian positive definite covariance 


matrices, and P-, 4 

Let us look at the form of the likelihood ratio when 
Rf = R2(=R). We have, 


Pi Cz) 


—npr = exp {z’R-^z - ij?^R“^z - z* R'V^ + - z’r”^z 

^ 21 j ^ fvi tSiZL ^ •Vi fVJ 


+ 4 !r“^z + z’R“^U, - 

cv»X «■' <N> <P^X roX raX 


= exp t(4iR”^z - /I'r-^z) + (z*R“^4t -■ z’r'V^) 

^ ^ '#rfX ^ ^ 2 . Vs #wZ 


= exp [ Re{ (ilj^ - il2)*P”^z} + (u^R 


In the light of this, coupled vdch rhe fact that the 
distributional theory associated with the quadratic discriminant 
function in the case of uneoual covariances is very complicated 
([40]), we consider the following linear procedures I 



173 


' (6.2. 

»2 

(accept) 

where is an n-dimensional complex vector, and c is a 

scalar. We consider the solutions for a corresponding to 
the criterion of maximizing the Bhattacharyya distance. 

6.3 DISCRITE TIME SERIES 

6.3.1 METHOD OF FINDING THE ALPHA VECTOR FOR AN ARBITRARY TIME 
SERIES 


First we note that a*z is one dimensional complex random 
variable distributed normally with mean and variance 

cl^R^a under H-(j = 1,2) (see [40]). Hence (.see Chapter II), 

A *1 

I = Re(a*z) N(Re(a?‘‘ijL,-) » o under H.(j=l,2) 

^ 0^' 0S3 IMj ' Z to Jt^ J 

Thus , 


P 2 il, 2 \l) = 


where 6 = 4i - b-o 

iMi iwZ 


a R. a. ^ a 

^2 to I to 2 2 ^ 


ci J*lRi«2)a ! 


1 

2 


exp { - I 


(Re(a* 6))2 


-1 ( 6 . 3.1 


a* (Ri^‘R 2 )a 


Differentiating (Appendix H) lnp2(l,2,j;^ with respect to a 
31 nP 

and setting — — = 0 , we get 


*T " 2 - 


2 (Re (£*a )) 


ok, a a*(R,+R^)a (0?^ (R,+RQ)a) 

^ ^ 1 2^ t< l to ^ 


r]R 


* 


2 -'“i;; 


[«*R 


a»R^a " ^TrT+V^ 

«V3 Zto to X. tC. to 


2 (Re(a* 4 J )2 


-]R.a 




174 


2 

* (R,+R„)<l “ 


Thus, it follows from the same argument as used in the real 
case that the value of a for which -lnP 2 (l, 2 jj) is a maximum 
is given by 


a = (Rj^-9R2)“^6 , 
Re(cjJd) 


where 


(6.3 . 2 ; 


2{ 


- 0 




•/ 


-s: 2:l 


a*(Rl-*-R2)q^ 


(6.3.3) 


^ 6) 2 

2{ — r r + 


a (R,+R^)a ct R,a a (R, +R^)a 

1 2 00 1 »>» i 4 •«» 


An iterative procedure must be employed to solve for a. 

Remark 6.3.1 *• Since there always exists a non-singular matrix 
P such that 

R, = P*P 

and R 2 = P*<' P 

where A is diagonal vdth elements as the eigen values of 
^''2^‘l^ , a similar study on convergence of the interation 
involved in (6.3.2) can be carried out as we have done in 
the real case. 

6.3.2 METHOD FOR FINDING a IN THE CASE OF LARGE SAMPLE FOR 
COVARIANCE STATIONARY TIME SERIES 

The follomng theorem gives an explicit expression for 
the optimal vector a in the seaise of maximizing the Bhattacharyya 



175 


distance asymptotically under certain regularity conditions 
similar to those made in the real case. 


Theorem 6.3.1 : Let 

(1 ) the n-dimensional vector z = (z (o ) , . . . ,2 (n-1 ) ) be 
a covariance stationary complex normal time series with mean 
and covariance matrix Rj = ( (r j(s-t) ,s ,t = 0,..,n-l)) 
under hypotheses Hj (j = 1,2), 

(2) the spectral densities W^Ca) of the process under 
the hypotheses are positive on [-m,*], 

(3) the seauence of mean difference 5(t) satisfies 


given by 


(i ) sup 1 6 (t )1 < " 

t ^ , n-l-l T I 

(ii) ~ n ^ 6(t) 6(t+l t I ) has a limit 

t=o 

iCT) ^ . 


]!-♦«>** — IT 

where U(X) plays the role of MCX) as described in the real 
case , 

(4) S lr.(t)| <“ 

t = -CO 

where o<p<l (j = l,2). 

Then the desired optimal a is given by 

a = CRi+Ro )“^6 

Proof I We note that, 

lnP2Cl,2, ig) = i In 3' Ri^RiR^^S + ^ In 

-1 

e 


■ma Cl ^ 


- 2 - 2 rR;l(Ri«2)Rils 



176 


The rest follows if we use the same approach as in the real 
case. 

The follovdng remark is inevitable. 


Remark 6.3.2 I The linear procedure (6.2.2) gives rise to 
two types of errors which are as follows I 

Re(a*‘Pi)-c 

®1 ~ < c) = 1-f (y^), where y^= 

^2 2*^l£) 

A ^2^ 

and e2 = Pr2(Re(a z) > c) = 1- #ty2;> where Y 2 = -y 

<2 5* '’■ 22 )^ 


Thus , 


^2 = 


JL 

Re(a*6) - y, a*R,a)^ 

#0 *«a X jL X 

I 

<2 


Suppose ej^ is given or eouivalently y^^ . Then one can show' very 
easily that Y 2 ^ *» as n « when a = (Rj^+R2)’’^6 . 

6.3.3 SOME SPECIAL CATEGORIES OF PROBLB'.S 

1) Suppose 6 is a null vector. Then from (6.3.1) 

^ 1 

P2(l ,2; i) = 1 (6.3 .4) 


Since there always exists a non-si noular matrix P such that 


wit-* 



177 


= P*P 
R 2 = P*AP 


where A is diagonal with elements as the eigen values of 
the matrix ^ 2 %^ ’ rewrite (6.3.4) as 


^2(1*2,^) 



1 

S*A8 1 

cv. •sa X 
(1 + 

fO 


where Pa = 8 . 

€0 

By a similar argument as in the real case, we have the 
following. 


(6.3.5) 


Theore m 6,3.2 I The optimal £ is the eigen vector corresponding 

^max^^2^1^^ according as 

\,inC'<2'’lh- >< 1- 

The optimal a is obtained by solving (6.3.5). 


2) Let 


Z(t) 


'' b M.(t) + n(t) , under 
^n(t ) , under H 2 , 


where b is a complex normal random variable with zero mean 

r\ 

and Elbl = 1, y.(t) is a complex-valued deterministic function 
of t, n(t) is a zero mean complex normal process. The above 
model corresponds to the radar problem where we have to decide 
whether or not a target is present at a particular location 
([64]). 



178 


Let n observations be made on {Z(t), teT} and 

z = (Z(o),Z(l),..., Z(n-l)) 

a = ( 4 ( 0 ), 4(n-l)) 

n = (n(o) rTCn-l) ) 

Then Ez = 0 , R, = £„ zz*= Er, + n)(bM+ n)* 

= R„ + MM*, 

Z. #yro 

where E nn*= R^. 

c*aisj Z 

Then (6.3.1) reduces to 


b2^T»2,5) — ( 2 ) 


1 [ { a*(R^+MM*)a} la*iR^)a} 


Ca*(2Ro m*)a.Y 

1 a*Mj 


«SJ 2 <>J «SJ ' ^ 

2 1 


(1 - f 




, * , 2 1 

1 I 0. 1-1 75 

(1 + 


a^R^cc 


Then we have the following 

Theorem 6.3.3 I The optimal a in the sense of maximizing 
--lnP 2 (l, 2 J) is given by 

5* = 


J 


(6.3 .6) 


(6.3.7) 


WH* 



179 


Proof : Noting (6.3.6) is of the form 

1 

V = 

^ 1 

where y decreases as x increases, by virtue of the following 
ineauality 

< (a*R2a)(|L;*R;\i), 

(see ([ 57 ]), the theorem follows, ^t once. 


3) Let ^ =/'M + n , under 

|n , under 

where Ji has a n-variate complex (real) normal distribution 
with zero mean and covariance matrix Ro» and n has a n-variate 

O *0 

normal distribution with zero mean and covariance matrix R 2 » 
independent of M- . 

^ * 

Then R, = z z = R^+R^. 

X ^ ^ o 

Consequently, (6.3.4) reduces to 


1 1 
<2 *2 2 *’’ 2 y '' 

P,(1,2U) = 

^ 1 

;i a* (2R2+R3)ai^ 


a*R-a 1 


a*R~ a 1 

ra O to '' 


os Jt •« OS OJ 

(1 + — >7(1 + I 


a R^a 

«s» ^ 


Then we have the following theorem. 


(6.3.8) 



180 


6.».3«4 • The optimuin a in the sense of inaxiinizinQ 

the — lnP 2 (l »2» j) is the eigen vector corresponding to the 
maximum eigen value of 

?Xoof I It follows once we observe that (6.3.8) is of the 
form 

1 

y = 

^ 1 
(1+^x)^ 

where x > o. 

6.4 CONTINUOUS TIME SERIES 
6.4.1 SECOND ORDER TIME SERIES 

The observed time series {Z(t),teT } is assumed to 
be continuous in time. Let T = [o,A], where A is a real 
number (finite). Assume E|Z(t)l^ <» for all tcT. Let 

E„ Z(t) = 4.(t) (6.4.1) 

3 . 

Cov^ (Z(t), Z (u ) ) = R:|(t,u) (6.4.2) 

3 

where (j = 1»2) and t,ueT. 

Our first step is to reduce the process {Z(t) , tsT 1 to a set 
of random, variables (possibly countably infinite set). This 
is achieved by the method of the series expansion I 

Z(t)=E z <p (t) 

n=l " " 

in the following sense I 

K 2 

lim ElZ(t)-D = o, 0 < t < A 

oo n=l ^ 


(6.4.3) 



181 


A 

where = S Z( t )<Pj^( t )dt (6.4.4) 

o 

and {<Pj^(t) J is a set of complete complex orthonormal functions 
in the interval [o,Al i.e. 

/ »n(t) , 

o 

(see Appendix F). 

Following the arguments as in the real case, the linear 
procedures of interest are given by 

A A . h 

1= Re( J a^(t)Z(t)dt) ^ c (6.4.5) 

(accept) 

(see Appendix I for a definition of stochastic integral of a 
complex process). The problem is now to find an optimal 
a(t) , te[o,A] in the sense of maximizing -lnP2(l,2J2) , where 
i is defined in (6.4.5). 

The following theorem states the method for obtaining a(t),te[o. 

Theorem 6.4.1 I The a(t), te[o,A] , for which -Inp2(l,2j2) is a 

maximum satisfies an integral eouataon of the follov/ing type, 

A 

/ Rq ( t ,u)a(t)dt = 6(u)j (6.4.6) 

o 

to be solved iteratively, where 9 is given by 



182 


Re(/ a*(t)6(t)dt) 

2 1 _ ? ^ 

R(t ,u )a*( t)a(u)dtdu AAr^CI , u)a*(t )o<(u5fl(-tAi. AAR(t ,u)a *(t)a(u)dtdu 
O o O O ^ o o 


and 


Re(/ a*(t)6(t)dt) 

{_o 


2 

-} + 


AAR(t ,u)a*(t)a(u)dtdu AAr, (t ,u)a *(t)a(u)dtdu /A^(t ,u)a*( t)a(u)dtdu 


o o 


o o 


o o 


Ro(t,u) = R, (t,u) - SR2(t,u) 
A 


R(t,u) = Rj^Ctju) + R 2 (t,u) 

The iteration continues until I ^1 < e , w^ere e is a 

pre-assigned number and i denotes the number of the iteration. 


Proof I We adopt the approach taken in proving Theorem 4.2.1 * 
6.4.2 COVARIANCE STATIONARY TIME SERIES WITH THE INDEX SET [o,®) 


The following theorem gives an explicit expression for 
aCt), 0 < t<<», if Z(t) is a stationary time series. 


Theorem 6.4.2 I Let 

Sj(^) = /“v.(v) dv (6.4.7) 

o 

D.CX) = Td^Cu) du, Cj = 1,2) (6.4.8) 

J o 

A 

Vj^(v) = Re 


where 


(6.4.9) 



183 


A 

V 2 (v) = Im Rj^(v) + ira (6.4.10) 

A 

and 6 (v) = 6 -|^(v) + i 62 (v) (6.4.11) 


Then under assumptions similar to those given in 
Theorem 4.3.1 with the obvious modifications required for 
the complex case, the optim.al a(t) is given by 


where 


a(t) = r (Aj^(X) + iA 2 (X)) ^ 


A^(A) 


S^(X)D^(X)-^S2(X)D2(X) 
S2(X) + S2(X) 


(6.4.12) 

(6.4.13) 


A2(?^) 


S2(X)D;^(X) -S^(X)D2(a) 
sf{K) + s|(X) 


(_CC < X <«) 

(6.4.14) 


The following lemma which is known as the Sampling theorem for 
complex stochastic processes is essential in proving the above 
theorem. 


Lemma 6 . 4.2 I If for some A (real number), the spectral density 
of a covariance stationary complex process C Z(t) , 0 < t < “ } 
is zero outside of the frequency interval - X < , then 


Z(t) 


sin A (t- 

; 1- Z(^k) . (0 < t < -) 

*'=° 


We omit the proof as it is readily available in ([29] ,[?]). 



184 


Proof of the Theorem 6.4.2 I 


Using the approach as in Theorem 4.3,1, if we write 


a (t) = (t) + ia2(t) , 

6 (t ) = 5^ (t) + i 62 (t) , 

R^(t)= ReRj(t) + ilmR^(t), (j = 1,2), 


we end up with the following simultaneous integral eouations 
after equating the real and imaginary parts of the equation 
(6.4.6) : 


/ V-j^ ( t-u )a2 (t)dt + / V2( t-u)a2(t)dt = ^^(u) 
> o 

OO CO 

S V2(t-u)a2 (t )dt - / (t-u)a2(t)dt = 62 ^'^) 


(6.4.15) 


By an application of Lemma 4.3.1 of Chaoter IV, (6.4.15) reduces 
to 


S2^(a)Aj^(a) + S2(A)A^^.Aj = Dj^(X) 
S2(X)Ai(X) - S^(a)A2(a)' = D2(a) 


(6.4.16) 


From (6.4.16), the theorem follows at once. 


6.4.3 SOME SPECIAL CATEGORIES OF PROBLEMS 


’ 1) Let 


Z(t) = 


•'’bM(t) + n(t), under 
: n(t) , under H 2 » 0 £ 


where b, Ai(t), 'n(t) are as described in Section 
Then the following theorem states how to obtain 
the sense of maximizing -lnP 2 (l »2*,2 ) . 


(6.4.17) 

t <_ A 

6.3.3. 

the optimum o'(t') in 



185 


Theorem 6.4.3 t The optimum a(t), 0 ^ t < A, can be obtained 

by solving the following integral equation I 

A 

I R 2 (t ,u)a(t)dt = fxiu) (6.4.18) 


Proof I We adopt the approach taken in proving Theorem 4.2.1. 


2) Let n(t) contain two statistically independent components 
n^Ct) and wCt) , where a?(t) is a white complex normal process 
with 

E w(t) 0 ) (u) = 6£j(t-u). 

Then the model (6.4.17) can be recast in the form I 

'bu.(t) + ^ ‘‘^('t) 

Z(t)=< (6.4.19) 

n^(t) +^(^1 

We consider a special form of this model in which 

K 

n (t) = £ b. M-Ct-T::; e^^j'*' 

c j=l J -J 


Then (6.4.19) reduces to 


K 

^ b u(t) + £ b .^.(t-TOe^^^j^ + ui(t), under H, 


Z(t) =< 


j=l 


K 

£ *b . 4 ( t-T:|)e^^j^ + w(t) , under ^ 2 * - «>< t 
j=l ^ 


(6.4.20) 


< «> 


where t.’s,o).’s and K are given constants, b, and b. are 
J J j 

zeroHmean complex normal r.v.s. that are statistically independent 
with unequal variances t 


E b^ bd = 2oJ 

E b j ^ = 2o^6j^ , j,^= 1,..,K 


(6.4.21) 

(6.4.22) 



186 


and E b , b , = E b, b. = E b,b* = E b^b , = n V i = 1 K 

QQ -LH dj -ii***-*^ 

(6.4.23) 

The above model (6.4.20) corresponds to the problem (known as 
Resolution problem in discrete environment " ([64])) of 
detecting a desired target in the presence of interfering 
targets. The interesting feature of this model is that the 
noise process depends an JJ.( t). 

The covariance function of n^(t) is given by 

K 

K.(t,u) =22 o^M(t-T.)w*(u-T.Oe^“j^^"^^ (6.4.24) 

C J J 

Since R 2 (t,u) = K^(t,u) + 6Q(t“u), using (6.4.24), (6.4.18) 
reduces to 


p(t) = / K (t,u)a(u)du + a(t) 


K 

= J i Z 4(i-T.)|jt*(u-T^)e^‘^j^'*'“^^ ]a(u)du + a(t), 

- « j=l J J 


(6.4.25) 


K 


icu .t. 


= 21 o^p.(t-T.) e'^^j‘'( / |Li*(u-T.)e“^^j’^a(u)du) + a(t) 

j=l J j _0O J 

(6.4.26) 


We note that (6.4.25) is an integral equation with a separable 
kernel. The solution to (6.4.26) is given by 


a(t) = 4(t) + Z a .4(t--T.)e^“j'^ 
j=l ^ 


(6.4.27) 


where are constants to be determined. In what follows we 

describe a method for calculating the constants (see [64]). 



187 


Define 


b - (b^ jb^ , . . jbjr) 

M^(t) = (u(t-Tj^)e^“l^,.. ,M(t--^)e^^'‘^)' 
A = E b b* = 2 


o 


o 


■'K 


and 


(6 


( 6 , 


P = / 4^(t) M^(t)dt 

— OO 

Thus (6.4.24) can be rewritten as 

K^(t,u) = Mi(t)A V*(u^ 

We can write (6.4.27) in the matrix notation as 

a(t) = jJi(t) + cz^'JJ. (,tj 
= jLt(t) + » 

I 

where a = (aj^ »a2 » • • »aj^) . 

Substituting (6.4.28) and (6.4.29) in (6.4.25) » have 

4(t) = /“ { ju’ (t)A H*(u) + 6r5(t-u)}{ u(u) M^(u)a }du 

« A 


where 


/ M*(u)>J.(u)du 

^ oo ^ 


(6, 


The required a is thus given by 

#o *s> 1 ^ 

a = -(I 4 -AP*) 1 a 

The above result is illustrated by a simple example in 
Van Trees ([64] ) . 


.4.28) 

.4.29) 


4.30) 

4.31) 


(6.4.32) 



188 


Remark .6*4»1 I Let us consider the model as in (6.4.17), Then 
the aCt) which maximizes satisfies (6.4.18), that 

is , 


R^(t,u) a(t) dt = 4(u) 


Now, our test is 


H, 


2 = Re( a*(t)Z(t)dt) ^ c 

^2 

The Bayes’ optimum test is given by (see j.64j). 


H, 


I / *g’*(t )Z( t)dtl ^ ^ 7 


_-x. 2 


where 


and g satisfies 


2 2 

A ,, ofof 

7 = (In m - In y~') 


l-a> ^ „2_„2 
01-02 


A 


/ R^(t ,u)g(t)dt = M.(u) 


(6.4.33) 


(6.4.34) 


(6. 4.35) 


which is the same as (6.4.33). 


6.5 DESIGN OF SIGNALS 

1) Model I 
Let 

Z(t) = 


r 4(t) + n^(t) , under 


n_(t) , under , teT 
V ^ ^ 

where and n 2 (t) are two zero mean complex normal processes 



189 


®1 ®2 

<=> 


Re(a*M) 

#s> (PO 


1 1 
(/Rl0t)^+(a*R2a)^ 


Then we have the following theorem analogous to that in the real 
case • 


Theorem 6#5#1 I Our signal selection criterion is 


a fx 


max 

M 


max 

a 

ro 


(a*R .a) 


(j = 1 , or 2) 


M- =1 

The optimal a and M are given by 


and is an eigen-vector corresponding to the 

^min^^2^ according as 


i ^ 4 

1 1 


Proof I It follows immediately if we note that the results of 
([ 57 ]) used in proving Theorem 5.3.1 hold good in the complex 
case also. 

Remark 6.5.1 : If we set e^^ = kj^e^ , the same result follows. 



190 


Model II : 

The following model occurs in communication. 

(b-tk)iLL(t) + n(t) , under 

Z(t)=<^ 

I 

V n(t) , under H 2 * 

where k is a given constant, b, nCt) are the same as 

in (6.4,17). The following theorem states what is the optimal signal 




191 


without loss of generality we can take a*^L to be real. 


Thus, (6.5.1) reduces to 


1 1 
( 2 )^ k(a*^i)/(a*R^a)^ 


ro ro 4^^ 


^1 = 


2 , . . 2 1 




which is of the form t y 


a*R a 

4:1 #s» 

kx 


1 + (l+k^x^)^ 


where y increases as x increases. Thus 

a*ti 


max 

M 

4 M=1 


max 

a 



the problem reduces to : 


hence the theorem. 


Remark 6.5»2 I The same result is obtained' if our criterion is I 

min PrCe). 

®1 ^ 1®2 

Remark 6.5.3 I The above results can easily be extended to the 
continuous case. 

6.6 CONCLUSION 

As we have seen in the real case, the compact form of the 
optimal linear discriminant function with respect to the criterion 
of miaximizing the Bhattacharyya distance is available only in the 
asymptotic case if we consider the covariance stationary discrete 
time series and in the case when the continuous time series is 



192 


covariance stationary with the index set infinite. In some 
other cases also explicit analytical forms of the LDFs are 
available. In the case when the noise covariance kernel 
is seperable finding the desired LDF reduces to solving 
an algebraic system of equations. 


























CHAPTER VII 


CONCLUSIONS AND SUGGESTIONS FOR FURTHER WORK 

On page 193 we have given a Flow Chart of the work 
done in this thesis. We see in Chapter III that the 
proposed criterion of maximizing the Bhattacharyya distance 
for optimal LDF leads to solving an implicit equation for 
a discrete time series. The comparison of the performance 
of our LDF with other LDFs and QDF reveals that the test 
statistic of our interest is worth-considering. In the 
case of stationary time series, we observe that when no 
compact form of optimal LDFs is available for the criteria 
considered so far in the literature, the maximization of 
the Bhattacharyya distance does yield one asymptotically. 

To the best of our knowledge, LDFs for continuous time 
series have not been considered so far in the literature. 

This issue is taken up in Chapter IV where we notice that 
f inding the optimal LDF amounts to solving an integral equation 
of Fredholm type and we are able to obtain a compact form of 
the optimal LDF in the case when the time series is stationary 
with the observation interval infinite. 

In ’’Design of Signals’’ problem considered in Chapter V, 
an analytical solution for the optimal signal is available 
only through a bound on the total probability of misclass if icatio 



195 


In Chapter VI we are able to extend all the major 
results from the real to the complex-valued time series. 

A wide variety of interesting problems arise out of 
the investigations carried out in this thesis. Some of 
them are mentioned below. 

We have seen in Chapter III that an iterative scheme 
is employed to find a desired a. It would be interesting 
to develop a recurrsive scheme for finding the same. 

A good deal of effort is needed to make comparisons 
of the performance of our procedure with other LDFs as well 
as with QDF for continuous time series and to study the 
convergence of the iteration process given a Chapter IV. 

We see in Chapter VI that under the model in (6.4.17) 
the integral equations yielding the desired a(t) of the linear 
procedure and the g(t) of the Bayes optimal test are identical. 
Further investigation is needed to determine the class of 
covariance matrices for which the above assertion holds good. 

We have dealt with the two-group classification problem 
in the previous chapters. The classification problem involving 
more than two groups have to be looked into ; an approach 
adopted in [bOj.may be useful. 

Suppose we have observations x^j , j = 1,2,.., n^^ , 
from (i = 1,2). Problem is to assign x to or H 2 and 
p (x ) is not known. The following approach is given in [58]. 
Calculate the distance between the sample distribution functions 



196 


of (i = 1,2) for the two cases, the first including 
X in H, and the second including x in H,,. Assign x to the 
hypothesis that gives the smaller distance. Effectiveness 
of the Bhattacharyya distance in this direction remains 
unexplored . 

In the present work, we have considered normal processes. 

The area where the assumption of normality is no longer valid 
but the process of concern is best modeled by a Markov process 
(for example, AR processes) calls for further investigation ([63]). 



197 


APPENDIX A 

Bhattachaxyya distance and tJie triangle Inequality 

In the following example (see [26]) we shall see 
that the Bhattachaxyya distance need not satisfy the 
triangle inequality. 

Take three normal distributiorB with means zero and 
standard derivatiore 1 ,4, and 5» denoted by Tj^»P 2 ’ ^3 

respectively. Then we shall show I 

-lnp2(Pj^,p2)-lnp2(P2’P3) < -lnp2CPi CD 

Now, -lriP 2 (Pj^ ,P 2 ) = ^ In ^U-^lo) - ^ In 16 

= 0.3769 
-lnP2(P2»P3) = 0.0124 

Thus , 

-lnp2(Pj^,P2)- lnp2(P2»P3) = 0.3893 
whereas -lnP 2 (P 2 ^ jP^) = 0,4778. 

From this, (l) follows. 



APPENDIX B 


Combining two Quadratic forms 


Ignpg 


If Q. = (x-M .)* R'T^Cx-m^) ( j = 1»2), then 


Q 1 +Q 2 = U-^) S(x-m)+(4-L~*X2) (^i''‘^2^ 

(1) 

where m= (R^ +R^)^^ 

(2) 

and 5 = R“^(Rj^+R 2 )R 2 ^ 

(3) 

Proof t Let Rj^ = Aj , (_1 = 1.2) and A = AJ^+A,^. 


Thus , 



Ql+Q^ - x'Ax- 2 x* ■*“ ^1^1% 

= 2' *5 - 2^,'% + m'Am + + ^ 2^2 " »'% 

= U-m)' A(x-ij,) + ^l^ki * U2^2^2 " ^ 

where m is defined in (2). 

But ffi'Am = (Aj^{x^+A2Ji2) ‘A"^AA“^(Aj_^^+A2lLt2) 

= A~^(Aj^jA^+A2J42) 

+ ii^AA“^A4tj^ - >|,^AA"^A2(ii,^-ii2) 



Thus , 


= SlAlUl ♦ (iijAjlXg - 

Again , 

A = Aj_+A 2 = RJ^+R”^ = R”^(R^+R2)R”^ = R^^ (R2+I^l 

and Aj^A’"^A 2 "" ^ 2 " ^ ^Ri'^R2^”^ 

Using (5)»(6), and (7), from (4) the lemma follows. 



APPENDIX C 



Calculate A2 using 

(3.3.7; 



IT = number of Iteratjon 
imAX = maximum number of iteration allowed 
= absolute value 
= pre-as signed small cfuantity 







201 


APPENDIX D 

Solutions of equations by Graeffe*s method 

Suppose the equation to be solved is 

f(x) = 0 (1) 

Let f(x) be expressed in the form t 

f(x) = (x-aj^)(x-a2)...(x-a^) 

where f(x)= E a. x^^,a=l 

i=l ^ ° 

Here n is the degrees of the equation, a^ ,a-|^ , . . ,a^ are 
the coefficients and . ,<1^^ are the roots. It is 

understood that a^ 5^ 0. If^e assume a^'s are real and 
distinct. 

Consider the function op defined by 

^(x) = (-1)^ f(x)f(-x) 

= Cx^-a^)(x^-a2)...(x^-a^) (2) 

Since (fCx) is a polynomial containing only even powers, 
we may define the polynomial 

f2(x) = <P(Vx) = (x-a^)(x-.a2)...(x-a^) , 
which has the property that the roots of f2(x) = o are the 
squares of the roots (l). Repeating this operation, we 
obtain a sequence of polynomials f 2,f^ ,f g,f , such that 

the equation 



202 


- (x-a^)(x-a 2 ) (x-aJJ) = o, (3) 

where m is a positive integral power of 2f has the roots 
“l***’^n* roots of (l) real and I >t a 2 l >. . .>1 a^l » 

then the ratios 



can be made as small as des5 red by making m large enough. 
Expanding (3) leads to 

fjj^(x) = x*^ - (a’^+. . . + C . . )x'^“^ 

-Cd^' a” a” +. . . . .+(-1 )" aio” 

Writing the right-hand side of (4) in the form 
~ + A2x'^“^+...+ (-1)'^ A^ , 


'2**’“n 


(4) 


we derive the approximation 





(5) 


From these , by taking mth roots, we may approximate the 
values of the roots (l)* 

Since the signs of the roots are not determined, by 
they must be checked by substitution. 

The coerficient may be round iteratively by the relations 

= (-l)i [ja2 + 0 i i i " 

( 6 ) 

n . 

where ^A^ = a^^ and f 2 j(x) = E ^A^ x” » 0 £ i £ n 

In the above formula, the presubscripts on A refer to the 
iteration counter, that is, ^A^ is the value of A^ found 



203 


on the jth pass of the iterative scheme ; A. is the • 

o 1 

initial value of . Indices greater than n mean that 
the number to be used is zero. 

Method of Solution : 

Let coefficient and j+j_A^ described in (6) be 
rewritten as 


*^i “ /i 
®i ^ 3+l^i* 

Then, given the initial values for the c^^ , where 

*^i ^ o^i ®i' 

the for one iteration may be evaluated from 
Bo = 1 

of + 2 (- 1 / . 

i = 1 ,2 , . . ,n 



APPENDIX E 


204 


Two Lammas 


Lemma : 


(a) lim 
n -* oo 


n «s> 


^ r h(x)dM(x). 

—It 


Proof: X* = (x(o),..., x(n-l) is covariance stationary 

mmm mmmmmmmmmmmmrn 

With mean = (^.(o),..., ^.(n-l)) , j = 1,2, 

~j '"j 

and covariance t, = ((dfs-t))) , s,t = 0 n-1 

O-(s-t) = Cov (X(s ) ,X(t ) ) , under 

Define the Discrete Fourier transform of { a(n) } sequence : 
h(X ) = E o(n) e^*^^ 

n=-oo 

If a spectral density h(X) is continuous, then for an 
arbitrary s > o there are two trigonometric polynomials 
^ iXlf ^ iXl. 

hT(X)= Z a, (k)e^*^ and h,,(^) = E o, (k)e^^*' 

^ k=-K ^ k=-K ^ 

with <^L(k) = Oj^C-k) and Ojj(k) = ay(~k) 

such that 


hj_^(X) < h(/v) < hy(X), -It 1 X£ n 

hy(X) - hj^(X) _< e , 

Define 

ih ~ ( ^ ^ ((Oy(s-t)). 


Now, for any arbitrary vector x. 



n-1 

E cj(s-t)x.x 
s , t=o ^ ® 


( 1 ) 



205 


n 


-- f I 

-Tt S,t 


It 


.11 n-l . 

I r* 

e 

-ir s=o 


I -iXs .. ,2 


d X 


"s'" 211 


S imilarly , 


n-l 


X' lx = /^ j “l" X 


d X 


-TC 


2 ii 


n-l 


X' X = \ Z ^j2 hjj(X) 1^ 




Then from (l). 


n-l 


-11 


r I Y X. Kl(X) I^A 


n-l 


< /*! I x^ei^'|2 h(\) |A 


- 1 C o 


n-l 


< r I f 


d X 


-n 


==^ i!’ Y 1 2’ i x^ui . 


for any vector x. 


Now, 


= i I 5(t) t|2 


It 


t,s-0 


y(k)e^^*') 


dX 



206 


P I 6(t)6(s) dX 

n • ^ lc““ 1C ti S"*'0 ^ 


o-- 2ti I a,,(k) Z 6(t)5(t+k) , 

k=-K ^ t e Sj^ 


(where Sj^ =f (o, . . . ,n-l-k)} if k 2 o 

{(-k , . . . , n-1 ) } if k 2 o ) 


K 1 n-l-lk| 

I a (k) { j- Z 6(t)6(t+|k|)l 
k=-K ^ ^ t=o 


lim i 6* ^ 6 
n ~ 

K , n-l-lkl 

< Z cr. ,(k). flim— Z 
k=-K ^ n " t=o 


5(t)6 (t+lk I ) } 


K 

= Z ay(k){ j ^ ( 5 y asymption A3 of Chapter II] 


k=-K 


= P { Z a„(k) 


•Tx k=-K 


= /^h^.(X)^ 

“71 

Similarly , 




Therefore, J h, (X) i 

L 2ii — n n 


< iiffi i «' i « 

— n n ^ ^ — 


< lim I i 6 
- n n ^ ^ ~ 


< r h,(X) 


This complete the proof. 



207 


f^(X) 

(b) lim “ 6 “ ‘^(?7+fI 


-n 


mlhl 

271 


P roof ; Let R° be a matrix whose (s,t)th element is 




and R° be a matrix where (s,t)th element is 


0-1, .X - 1 1 

r (s-tj = ~ L — 

^ n=o ^l^^m' 




Similarly, we define R° and R^ 


vie have, 6'R°(Rj+Rp"^R?i 


6'(R?-1 + Rr^)-^fi 


(s,t)th element of -f 


o-i ^ D°-l 1 V iX_(s-t) 


J. X III X. UI _X'' ' 

2a e m 

m=o f, (X )fo(X ) 

1 ' m*^ 2 ^ m'' 


O-l.^O-lv-1 


Thus, (s,t)th element of (R^‘‘'^+R 2 ) 


1 iX (s-t) 

- I e n,v 

m=o fi(\,)+f2(K„) 


Now we can write 
c T^ 0/|-,0 ,T^O\ — l,-.o 


6‘r°(R°+R°)“V6 = 6'r° 6- 6R°(R°+rO)--1r° 6 


so that 


6'r?(R°+R|)”^R° 6 = 6'r°6 - 6'r°(R°+R°)"^R° 6 

t^X X Z 1*0 *o ^11 0*0 


^ ^l^m^ ~ —I— |D(X ) 


n-1 ff(X ) 

Z iIjZ 


Z jD(X )1^ ^ 

m=o fi(X^)+f 2 (X^) 



208 


where D(X ) = 2- £ 6(t)e*’ is the Finite Fourier 

"" fn t=o 

transform of the mean difference function (see [sj) and 
~ o,l,..,n-l). 


We want to show that 


i) 

lim ^ 

1^’r°((R^+R2)"^-(R;l'^^2^°’’^^^1 ^ 1 = ® 


n OO 


ii) 

lim i 
n •**» 

1 s'Ri(Rii-R2)'V - SI = ° 

Combining 

(i) and 

(ii), we get 


lim ^ 1 

XRj^(R^+R2)"'^Rj5 

-a’R°(Rj^fR2)°-X«l = 0 

n -► oo 


Hence, lim 

n - - " 

6R^(R,+R2)-^Rj6 


= lim 

n -*» 

i 6 Jr°(R^.R2)°-1 

R? 6 

X «<o 


= lim 

n -* oo 


1 

n 


n-1 

Z 

in=o 




iD(X„)|^ 


p 

-It 


mihi 


2 % 


f 


(see [ 59]). 

Proof of (i) : Step 1 We prove, sup Z |r|P| < «> 

p t 

where r^^ is the (t,p)th element of . 

We have, 1 |R, 1 | = sup j \r x\ j 

l|xi|<l 


sup 

ll=tll<l 


max 

i 


n 

Z 

j = l 


lij 


Xjl) 



209 



= max 1 2 Ixj^. j | 1 

= max { 2 lr,-.|) 

i j=l 

n 

Hence, 11R,|1 = max (2 lr,..l) 

i j=l 

Equivalently, Mr'^M = max ( 2 

P t ^ 

I |R^^i I is bounded if | jR^| | is bounded away from zero ; 
this is proved in ([32]). 



210 


Step 2 : We shall now show, 2 Z (p-q)-r°(p-q) j < 


n-1 


p q 


r°{p-q) = i Z 

n m=o ^ 

= 1 f(2™)^i-^(p-q) 

n _ _ n 

. 2nm 


n ni=o 
n-1 

f{^ 

m=o 

n-1 
£ 

in=o T = 
n— 1 «*> 


2itm, 


= - £ C £ X, (t) e ^ n 3e^ n 

n _ ^ ^ -L 

•oo 

.. 2innT . 2%m , ^ 


m=o 


T 


= £ r, (p-q+ln), 

^=- -- 


(put p-q-T= -In, £= o, +1, +2,...) 


Thus , 
n-1 

£ |r^(p-q)-r°(p-q) 1 

p,q=o 

n-1 «» 

= £ |r, (p-q)- £ r, (p-q+ln) J 

p,q=o l=-<^ 

= "£^ (n-lTj)lr, (-C) - £ r,('r+ln)! 

^=-(0-1) ^ £ = -» ^ 

= '^£^ (n-lTj) 1 £ r, (T+ln)l 

T=-(n-l) Z=-°° 

< "£^ (n-lTl) E Ir, (^+ln)l 

■“ T =-(n-l) /^=-» 

/C^o 

= ^£^ (n-|Tl){ £ lr,(T+ln)j + jr, (t +n) 1 + 1 r 3 ^('T -n) M 

'C=-(n-l) /^=-« 

^7^0, +1 



211 


n-1 
= Z 

T =-(n+l) 


(n-|T{){ I |r^(T+ln)!} 


n-1 

+ Z (n-Jx ( ) |r, (x+n) j 

T =-(n-l) ^ 


+ I (n-|x l)|r^(x-n)| 

T=-(n-l) 


oo n-1 

< n Z Z |r, fr +ln)( 

^=-00 x=-(n-l) 

n-1 o 

+n I |r, (x+n)| + n Z jr, (c-n)| 
o -(n-1) 

+ Z (n-lfl ) jr, (T+n) 1 + Z (n-|x 1 ) |r, (r-n) { 
-(n-1) ^ 1 


< 2 Z Iv! Ir, (v)l 

lvl2n 



We prove. 


tt’ ott’ 


<(sup Z lr^'^l)(sup Z Ir?^^ 1 ) [ZZ | r, (p-q)-r°(p-q) 1 ] 
”pt-^ ^ t' ^ pq ^ 

Since for any non-singular matrices A and B, A ^(A-B)B ^=B ^-A , 




212 


< (sup z |r*P|)(sup Z Ir^'^^'DCZ (r, (p-q)-r° (p-q) | ) 
p t q f P,q 


Step 4 : Proof of (i) will be complete if we observe, 
following the above steps together with ([59]) that 

j 6 ’rJ((R^+R2)-^“(R^+R2)°"^)R°6 I 

£(sup I '^(t) I )^(sup Z |r°(t-t ' ) 1 )^(sup Z ir^P|)(sup Z |) 

t t t ' P t q t * 


5<[Z r |r (p-q)-r°(p-q) I ] 

P q 

where r(p-q) = (p,q)th element of (Rj^+R 2 ). 

P roof of (ii) : It follows immediately (see [59]) if 
we write, 

I «'r^(R^+R2)"S« - 6'rJ(Rj+R2)'X 

_< [ sup 1 Z 6 (u)rj^ (u-t)r”^(s-t) 1 
” s u , t 

+ sup 1 Z r”^(t-u)r?(u-s)6(s) I ]( Z |rj^(s-t)-r°(s-t) [ ). 
t u,s s,t 



APPENDIX F 


Series 

Theorem 

Then 
And if 

where 

and 

then 

provided 

Proof I 

Now, 


Representation of a stochastic process 


: Write x(t) = i; x *Pn(t). 
n=o 

Eu (x(t)-xCt))^ = o. 

rj(t,u) = 2 (for a given u), 

PjnC’j) = TjCt.u) fj^(t) dt 
*P (t) ffi (t) dt = 6 f 


Ex^(t) < "o 
Ej^ (x(t)-x(t))^ 



x(t) £ x^ <P^(t) 

j o 

= E L x(t)( x(u)fj^(u)du) <Pj^Ct) 
n o 

= H r.(t,u) <p (u)du) 

o o 

= S Pj^Ct) <P^(t) = r^Ct.t) 

n=o 



214 


E E x^x (t) <P (f) 

n,m=o " ^ 

= %Ct) ''nCt) 

(s ince 

^ Vm = ;;^ rj(t,u) <P^(t)dt) <Pjjj(u)du = _/ p^^(u)<Pjjj(u)du) 


o o - 111 ' O 

E c/(S 6 .^Cu^KP ft))<p (u)du} <p^(t) 
m=o o n=o J" n m m 


= E { r-(t,u) <?> vu}du}<P (t) 

m=o o 

=X V*) %(*) 

= rj(t,t). 

This completes the proof. 

A complex process {Z(t), tsT? has the similar representation 

Z(t) =1 <p„ct) 

n=o 

where 2(t) <P^(t)dt , ^^(t) ^(t}dt = 6 ^ 

For, proof, follow the above approach and note that in this 


case , 


R(t,u) = E pj^(n) <P^(t) 
R(u,t) = R(t,u) = E 

R(t,u) = E B^(t') <P^(u) 



215 


APPENDIX G 


The Dirac Delta function 


Suppose that ^'(t) is any function which is continuous 
at t = o. Then the Dirac delta function is such that 


/ 6^(t) <P (t)dt = (o) 


( 1 ) 

(see [l4]). It is important to realize that 6j^(t) is 
not a function. Rather it is a generalized function which 
maps a function into the real line. 

Even though '5£,(t) is a generalized function, it 
can often be handled as if it were an ordinary function 
except that we will be interested in the value of the 
integral involving ^S^Ct) and never in the value of SpCt) 
by itself. 


The derivative of also be defined 


I 6p(t) <P(t)dt = - «^'(o) 


by 


where ^ (o) is the derivative of fl>(t) evaluated at t=o. 
The justification for the above depends on integrating by 
parts as if and ^^(t) were ordinary functions and 

using the following definition of delta function (which is 
heuristically useful), 


6D(t) = 


t ^ o 


t = o 


such that 


f fi(t)dt = 1. 



216 


We have 


oo 

/ '^£)(t) <p(t)dt = - /” 6'(t)dt = - ^’(o) 

«»CO 



217 


APPENDIX H 


Differentiation with respect to a matrix 


Let Z = ((zjj^)) be an nxm complex matrix. Let g 
be a scalar valued function of Z. If we write z., = x., + iy. 

Jk ^3 

where Xjj^ = Re Zjj^ , and = Im Zjj^ , then g may be 
considered to be a function of 2 mn variables, ^j}^» Yjk* 

1 j^j^n, l_<k_<m. Now suppose g is a differentiable 
function of these 2 mn variable in some region of 2 mn 
dimensional real Euclidean space. Then we define (see [40]) 
the derivative of g with respect to Z as the nxra matrix 

For example, if ? and c are n-dimensional complex 

rv.- 

vectors, then a direct application of this definition yields, 


§t(c’£) = c 


(c’i) = o 


If Q is an arbitrary nxn complex matrix, then 


■(I QO = (Q+Q )i 


|j(|'qU = Q'f 

ro 


( 5 ) 



APPENDIX I 


Integration of Complex Stochastic Processes 


Let fZ(t), teT} be a complex stochastic process 

defined on a probability space 

Let T = [a,b] be a closed finite interval. 

Let n be an integer and 

a = t. < t, < ... < t^ , < t„ = b 
o 1 n-l n 

be a partition, it, of T. In each sub-interval 

choose an arbitrary point and call it tj^. The partition % 

now becomes a marked partition, u*. 

Define 


n 


S,. = 2 Z(t')(t^_i-t^) 

k=l 


-n’ “'-k^^-k-l "k 

which, called the approximating sum, is a complex random 
variable. 


If S^t converges in mean square as the norm 

= max partition it approaches zero, 

l<k<n 

the limit~will be called the mean square integral of 
!Z(t), tsT } and we shall write 


lim S^, = / Z(t) dt 

V -*0 ^ T 

We shall give some sufficient conditions under which the 
integral of a complex stochastic process exists. 

T heorem (Miller [40], p.l05) : Let JZ(t), teT ] be a 
complex stochastic process with mean zero and covariance 



219 


function R(t,s) = EZ(t) Z ( s ) . Let R be continuous on TxT. 
Then Z(t) is mean square integrable on T (which is a finite 
interval [a,b]) i.e. 

/ Z(t) dt exists in m.s. 

T 

Theorem (Miller [40], p. 109) : Let T [a»b] be a 

closed interval and let {Z(t),t£T1 be a complex stochastic 

process with mean zero and covariance function R(t,s). 

Let R be continuous on TxT and let g be a complex valued 
function of the real variable t which is continuous on T. 
Then 

b 

/ g(t)Z(t)dt 
a 

exists as a mean square integral. 

Now, we shall see how we define integral of a 
complex process when T = [o,«0 

T heorem (Miller [40], p.l4) : Let {Z(t), o < t < » 1 

be a complex process with mean zero and covariance function 

R(t,s) = E Z(t) Z(s 5 . Let R be continuous on [o,«>)x[o,“>) 

and let g be a complex-valued function of the real variable 

t which is continuous on [o,<»). Let 

b b' 

lim S S g(t) g(s) R(t,s)dtds 
b,b'- « ° ° 

exists. Then 

I = S g(t)z(t)dt 

o 



220 


exists as a mean square integral and 

E5 = o 

OO CO 

e1|| = f I git) g(s)R(t,s)dtds 

o o 



REFERENCES 


221 


[1] Adhikari, B.P. and Joshi, D.D,, ''Distance, 
discrimination et re'suroe' exhaustif ' ' , Publ. Inst. 

Statist. Univ., Paris, 5, 1956, pp. 57-74. 

[ 2 ] Ali, S.M. and Silvey, S.D., ''A General class of 
Coefficients of Divergence of One Distribution 
from Another*', Journal of the Royal Statistical 
Society, Ser B, 28, 1966, pp. 131-142. 

[ 3 ] Anderson, T.W., ''An Introduction to Multivariate 
Statistical Analysis'', John Wiley and Sons, Inc., 

1958. 

[ 4 ] Anderson, T.W. and Bahadur, R.R., ''Classification 
into two Multivariate Normal distributions with 
Different Covariance matrices'', Annals of Mathematical 
Statistics, 33, (June, 1962), pp. 420-431. 

[ 5 ] Anderson, T.W. , ''The Statistical Analysis of Time 
Series'', Wiley, New York, 1971. 

[6] Apostol, T.M. , ''Mathematical Analysis'', 2nd Edition, 
Addison-Wesley Publishing Company, Massachusetts, 

1981. 

[ 7 ] Ash, R.B. and Gardner, M.F., ''Topics in Stochastic 
Processes'', Academic Press, New York, 1975. 

[8] Barnard, M.M. , ''The secular variation of Skull 
characters in four series of Egyptian Skulls'', 

Ann. of Eng., 6, 1935, pp. 352-371. 

[ 9 ] Bartlett, M.S. and Please, N.W., ''Discrimination 
in the case of zero mean differences'', Biometrika, 

50, 1963, pp. 17-21. 

[ 10 ] Bhat, N., ''Elements of Applied Stochastic Processes '' ,Ist/ 
2nd Edition, John Wiley and Sons, 1984. 

[ 11 ] Bhattacharyya, A., ''On a measure of divergence 
between two statistical populations defined by 
probability distributions''. Bull. Calcutta Math. 

Soc., 35, 1943, pp. 99-109. 

[ 12 ] Box, G.E.P. and Jenkins, G.M., ''Time Series analysis; 
Forecasting and Centrol'', Holden-Day, San Francisco, 
California, 1970. 



222 


[ 13 ] Cavalli, L.L., '* Alumni prolemi della analisi biometrica 
di popolazioni natural! Mem. 1st. Idrobiol . ,2,1945, 

pp. 301-323. 

[14] Chatfield, C., ''The analysis of Time Series'', 2nd 
Edition, Chapman and Hall, London, 1980. 

[15] Chernoff, H. , ''A measure of asymptotic efficiency 

for tests of a hypothesis based on the sum of observations'', 
Ann. Math. Statist., 23, 1952, pp. 493-507. 

[ 16 ] Clunies-Ross , C.W. and Riffenburgh, R.H., ''Linear 
discriminant analysis'', Pacif. Sci., 14, 1960, 
pp. 251-256. 

[ 17 ] Fisher, R.A., ''The use of Multiple Measurements in 
taxonomic problems'', Ann. Eng., 7, 1936, pp. 179-188. 

[18] Fuller, W.A., ''Introduction to Statistical Time Series'', 
Wiley, New York, 1976. 

[19] Gilbert, E.S., ''The effect of unequal variance-covariance 
matrices on Fisher's linear discriminant function*', 
Biometrics. 25, 1969, pp. 505-516. 

[ 20 ] Gill, P.E. and Murray, W., (Editors), ''Numerical 
Methods for Constrained optimization'*. Academic Press, 
London, 1974. 

[ 21 ] Giri, N.C. , ''Multivariate Statistical Inference'', 

Academic Press, New York, 1977. 

[ 22 ] Grettenburg, T.L., ''Signal Selection in Communication 
and Radar Systems'*, IEEE Trans. IT-9, Oct. 1963, 

pp. 265-275. 

[ 23 ] Han, C.P., ''Distribution of discriminant function 
when covariance matrices are proportional'', Ann. 

Math. Statist., 40, 1969, pp. 979-985. 

[ 24 ] Han, C.P., ''Distribution of discriminant function in 
circular models'*, Ann. Inst. Statist. Math., 22, 1970, 
pp. 117-125. 

[ 25 ] Hellinger, E., ' 'Neue begrundung der thorie quadratischer 
formen von unendlichvielen vertnderlichen ' ' , J. fur 

die Reine und angew Math,, 36, 1909, pp. 210-271. 

[ 26 ] Kailath, T., ''The Divergence and Bhattacharyya distance 
Measures in Signal Selection'', IEEE Trans, on Communication 
Technology, COM-15 (No.l), 1967, pp. 52-60. 



223 


[27] Kakutani, S., ''On equivalance of infinite product 
measures'', Ann. Math. Statist., 49, 1948, pp. 214-224. 

[ 28 ] Kolmogorov, A.N., ''On the approximation of distributions 
of suras of independent summands by infinitely divisible 
distributions'', Sankhya, 25, 1963, pp. 159-174. 

[ 29 ] Koopmans, L.H., ''The Spectral Analysis of Time Series'', 
Academic Press, New York, 1974. 

[ 30 ] Kullback, S., ''An application of information theory 
to multivariate analysis'', Ann. Math. Statist., 23, 

1952, pp. 88-102. 

[ 31 ] Kullback, S. , ''Information Theory and Statistics'', 
Wiley, New York, 1959. 

[ 32 ] Liggett, W.S., ''On the asymptotic optimality of 
spectral analysis for testing hypotheses about time 
series''. Annals of Mathematical Statistics, 42, 1971, 
pp. 1348-1358. 

[ 33 ] Macon, N. , ''Numerical Analysis'' Wiley, New York, 

1963. 

[ 34 ] Mahalanobis, P.C., ''On the Generalized Distance in 
Statistics'', Proc. Natl. Inst. Science, India ;2, 

1936, pp. 49-55. 

[ 35 ] Martin, E.S., ''A study of the Egyptian series of 
mandibles with special reference to mathematical 
method of sexing'', Biometrika, 28, 1936, pp. 149-178. 

[36] Matusita, K. , ''A distance and related statistics 

in Multivariate Analysis'*, Proceedings of the 
International Symposium on Multivariate Analysis, 

Ed. P.R. Krishnaiah, New York, Academic Press, 1966, 
pp. 187-200. 

[ 37 ] Matusita, K. , ''On the notion of affinity of several 
distributions and some of its applications*', Ann. 

Inst. Statist. Math., 19, 1967, pp. 181-192. 

[38] Matusita, K. , ''Classification based on distance in 
multivariate Gaussian case'', Proc. 5th Burkeley Symp. 
Math. Stat. Prob., 1, University of California Press, 
Burkeley, 1967, pp. 299-304. 


224 


[39] Matusita, K. , ''Some properties of Affinity and 
Applications'', Annals of the Institute of Statistical 
Mathematics, 23, 1971, pp. 137-155. 

[40] Miller, K.S., ''Complex Stochastic Processes'*, 
Addison-Wesley Publishing Company, Inc., London, 1974. 

[41 ] Neyman, J. and Pearson, E.S., ''Contribution to the 

theory of Statistical Hypotheses'*, Statist. Res. Memo., 

1, 1936, pp. 1-37. 

[42] Okamoto, M. , * 'An asymptotic expansion for the distribution 
of linear discriminant function'*, Ann. Math. Statist., 

34, 1963, pp. 1286-1301, (Correction: Ann. Math. Statist., 39, 
1968, pp. 1358-1359). 

[ 43 ] Papoulis, A., ''Probability, Random Variables and 
Stochastic Processes*', 2nd edition, McGraw-Hill Book 
Company, New Delhi, 1984. 

[ 44 ] Parzen, E., ''Stochastic Processes*', Holden-Day, San 
Francisco, 1962. 

[ 45 ] Patnaik, P.B., ''The non-central and F-distributions 
and their approximation'', Biometrika, 36, 1949, 

pp. 202-232. 

[46] Pearson, K. , ''On the coefficients of racial likeness*, 
Biometrika, 18, 1926, pp. 105-117. 

[ 47 ] Penrose, L.S. ''Some notes on discrimination'*, Ann. Eug., 

13, 1947, pp. 228-237. 

[48] Prasad, S., ''Design of Signals for Communication Systems*', 
M. Tech. Thesis, Indian Institute of Technology, Delhi, 

1971. 

[ 49 ] Rao, C.R., ''Tests with discriminant functions in 
multivariate analysis'', Sankhya, 7, 1946, pp. 407-413. 

[ 50 ] Rao, C.R., ''The problem of classification and distance 
between two populations'*. Nature (London), 159, 1947a, 
pp. 30-31. 

[ 51 ] Rao, C.R., ''Statistical criterion to determine the 

group to which an individual belongs''. Nature, 160, 1947b, 
pp. 835-836. 



225 


[52] Rao, C.R., * 'The utilization of multiple measurements 
in problems of biological classification''. Jour. Roy. 
Statist. Soc., B, 10, 1948, pp. 159-203. 

[ 53 ] Rao, C.R. , ''On the distance between two populations'*, 
Sankhya, 9, 1949a, pp. 246-248. 

[ 54 ] Rao, C.R., ''On some problems arising out of discrimination 
with multiple characters'', Sankhya, 9, 1949b, pp. 343-366. 

[ 55 ] Rao, C.R., ''Statistical Inference applied to classification 
problems'', Sankhya, 10, 1950, pp. 229-256. 

[56] Rao, C.R., ''Advanced Statistical methods in Biometric 
Research'*, New York, i^iley, 1952. 

[ 57 ] Rao, C.R., ''Linear Statistical Inference and its 
applications'', 2nd Edition, Wiley Eastern Private 
Limited, New Delhi, 1973. 

[58] Seber, G.A.F., ''Multivariate Observations'*, Wiley, 

New York, 1984. 

[ 59 ] Shumway, R.H. and Unger, A.N. , ''Linear Discriminant 
Functions for Stationary time series''. Journal of the 
American Statistical Association, 69, (December, 1974), 
pp. 948-956. 

[ 6 0] Singh, S., ''Design and Analysis of Experiments for 
Model Discrimination in Uniresponse and Multiresponse 
Systems'*, Ph.D. thesis, I.I.T. Kanpur, India, 1986. 

[61 ] Smith, C.A.B., ''Some examples of discrimination'', 

Ann. Eug., 13, 1947, pp. 272-282. 

[62] Tricomi, F.G., ''Integral Equations'', Interscience 
Publishers, Inc., New York, 1970. 

[63] Van Trees, H.L., ''Detection, Estimation and Modulation 
Theory'', Part I, Wiley, New York, 1968. 

[64] Van Trees, H.L., ''Detection, Estimation and Modulation 
Theory", Part III, Wiley, New York, 1968. 

[65] Vilenkin, N. Ya. , ''Method of Successive Approximation'', 

Mir Publishers , Moscow, 1979. 

[66] Wald, A., ''On a statistical problem arising in the 
classification of an individual into one of two groups'*, 
Ann. Math. Statist., 15, 1944, pp. 145-162. 



226 


j.dd.'.ll 

[67] Wald, A., ''Statistical Decision Function'*, Wiley, 
New York, 1950. 

[68] Wald, A. and Wolfowitz, J. , ''Characterization 

of minimum complete class of decision function when 
the number of decision is finite, ProcV Berkeley 
Symp. Prob. Statist., 2nd, California, 1950. 

[69] Welch, B.L., ''Note on discriminant functions'', 
Biometrika, 31, 1939, pp. 218-220. 



